Benford's Law Exploration

July 1, 2018 | Author: Caterina Rende Dominis | Category: Logarithm, Experiment, Science, Physics & Mathematics, Mathematics
Share Embed Donate


Short Description

IB Exploration on Benford's Law...

Description

Mathematics SL Internal Assessment Math Exploration

Benford’s Law

Student: Caterina Rende Dominis Class: 4N Teacher: Jelena Gusić Examination session: May 2014 Candidate number: fdt217 (000618-0027)

If we were to assume that the distribution of 1 st  digit numbers was divided proportionally between each number (for example if there were the same number of 5s as there are 9s), than the next few pages will illuminate the reader with a mathematical law that is still nowadays an unexplainable mystery to mankind. If we believe the frequency, or better yet, the probability o f numbers starting from 1 to 9 to be on average divided into equal portions of 1/9 th we are far from the truth. Over the years it has been observed that all numbers that grow naturally (meaning are not tampered with or made up by humans) most often start with a 1 and the least often with a 9. By the end of this exploration we may be able to observe how this law can be used for fraud detection in accounting or similar disciplines. To get an idea of number’s distribution the table below may be taken as reference in order to understand Benford’s Law further.

 

P….. probability d….. number in question Figure 1: Relative probability of d (with bar chart)

This rule has been quite controversial and surprising in the world of mathematics, as it can’t be fully explained. Some mathematicians claim it may not be used to detect fraud as they believe that it’s erroneous, and that one should not be convicted on the premises that the accounting numbers or election distributions do not coincide with Benford’s Law. In order to prove its accuracy experiments need to be made in order to prove either theory, depending on what the results come to show.

The first ever encounter in history with this law was made by Simon Newcomb, who never explained any of his findings, but just noticed them as something probing. The Law’s re-discovery happened thanks to Frank Benford, a research physicist at General Electric in 1930s from whom the law takes its name, who while working needed to consult a book of logarithmic tables. He suddenly noticed something rather odd: the first pages of the book were more worn out than the last ones. By observing this he concluded that the first digit (1) was looked up more often than any of the other digits. After this discovery Benford started collecting further data from nature in order to prove how widespread it actually was. His results were finally published in 1938. His published work showed more than 20000 values that were o btained from data in lengths of rivers, magazine articles, sports statistics, etc.

Explanation

Figure 2: Linear logarithmic scale Source: http://www.thisisthegreenroom.com/wordpress/wpcontent/uploads/2009/04/logs2.png

A logarithmic linear scale is determined by multiples of 10. In order to determine the position of numbers from 1 to 10 or from 10 to 100, etc. we need to find the logarithm of the number. An easier way to explain this is with an example: if we find the logarithm of 2 (log 2) the result is 0.301, which equals to the distance between 1 and 2. This is equivalent to the probability of the occurrence of number 1 in accordance to Benford’s Law. With the help of what we may observe above we can deduce that, even if we calculate the area between 1 and 2 it will be exactly 30.1 % of the area between 1 and 10, just like the area between 10 and 20 would be, or 100 and 200, and so on.

What we can observe with this pattern is the following: that the subtraction between the logarithm of 2 and the logarithm of 1 will have as a result the exact occurrence of the number one like in Benford’s law. In turn,  so will the subtraction between the logarithm of 3 and the logarithm of 2, and so forth, which we may see more clearly in the first few examples shown below.

log(2) - log(1) = 0.301 log(3) - log(2) = 0.176 etc… With that in mind we can come up with a formula with which we may be able to calculate the probability of a certain number, which would follow precisely Benford’s law.

If we consider that the leading digit d  (d ∈ {1, … 9}) is the leading digit in question than we may come up with the following formula:

 P (d ) = log

1+

1 d 

÷

In conclusion what we can finally observe is that the probability above is equal to the difference.

 Applying Benford’s Law to Real-life Examples Taking what was concluded above into consideration one would probably wonder what this rule undoubtedly applies to. The first experiment I did was with the Mathematics book we use in our class daily, the Mathematics SL Cou rsebook. I counted the 1st  digit numbers in exactly 10 pages (from page 18 to page 27), and the results were indeed very close to the exact values dictated by Benford’s Law, therefore, all things considered, it followed Benford’s Law very closely. I completed the experiment with no help from technology (as you may see in the scanned pictures below), which may have caused a slight human error, but after 3 trials this was the average with which the following results came up:

Figure 3: Benford‘s Law in math book notes.

Thankfully, with the help of the following source I was able to find a more accurate and less time consuming way of applying Benford’s to data sets using Microsoft Excel: http://www.theiia.org/intAuditor/media/files/Step-bystep_ Instructions_for_ Using_Benford's_Law[1].pdf  Another experiment that I have done was look at the global lengths of rivers. With the help of Microsoft Excel and the method that may be seen in the link above I attempted to use lengths of rivers to further prove Benford’s law’s efficiency. Even though this experiment has been done before, and it has been majorly successful, in my case the results were not what I was expecting. In the following graph you may see the comparison of the curve that coincides with Benford’s Law, and the result I got from the sample data:

35.00% 30.00% 25.00% 20.00% Sample Rate 15.00%

Benford Rate

10.00% 5.00% 0.00% 1

2

3

4

5

6

7

8

Figure 4: Graph comparing the rate of global rivers and Benford’s

Since the data set was unfortunately limited as it included the lengths of rivers of the 1000 longest rivers on the planet, I assumed it might have been inappropriate for this kind of experiment. In light of that fact I chose to experiment with data sets that were not restricted by length as in this case but rather by territory, so I repeated the same process with lengths of rivers in Croatia. In the following graph you may observe the utter similarity with the one above.

35.0000% 30.0000% 25.0000% 20.0000% Sample Rate 15.0000%

Benford law

10.0000% 5.0000% 0.0000% 1

2

3

4

5

6

7

8

9

Figure 5: Graph comparing the rate of Croatian rivers and Benford’s Despite the previous anomalies in the third trial with data sets concerning social media following the pattern seems to be much more along the lines of B enford’s law. This data set in contrast to the previous ones is quite new to us, and has been rarely applied to study Benford’s law.

35.00% 30.00% 25.00% 20.00% Sample Rate 15.00%

Benford Rate

10.00% 5.00% 0.00% 1

2

3

4

5

6

7

8

9

Figure 6: Graph comparing the rate of Twitter followers and Benford’s

The Benford Rate’s curve is followed almost perfectly by the Sample’s curve, which in turn proves Benford’s Law to be valid. Unfortunately my first two trials were not as successful, even though the same data sets have been in the past. My

guess is that there were some limitations is the data, or that the method of procuring the data through Microsoft Excel was inefficient, even though it worked perfectly in the last trial. In contrast to the Twitter Census data set’s case, there are some cases in which this doesn’t apply, like in the case of measuring human heights. The choice is too restrictive due to people being no more than 2 m tall, so the only numbers used to mark people’s height are  1 or 2. Another case which does not follow the law is in case of pre-assigned numbers such as postal codes or ID numbers, as those are numbers made up and preassigned by the government (aka made up and distributed by people), and do not actually occur naturally.

Conclusion

The limited amount of people that know of Benford’s Law have come to commonly know it as the fraud detecting law. And it is its “anonymity” and the common lack of knowledge about it that actually allows it work in that field. As people falsifying results most often do not know about Benford’s Law they make up numbers that they consider most plausible (usually trying not to use the same numbers too often and trying to divide them equally). They do so thinking that that will prevent other people’s suspicions, while they unknowingly prove themselves guilty. One of Benford’s law’s applications that is most famous nowadays is its usage to prove that the Iranian elections of 2009 have been tampered with, where Mahmoud Ahmadinejad won with 62.63%. The initial digit distribution was not consistent with the law, thus many mathematicians believed the results had been rigged. At the end these claims were not taken into consideration, as politics is in fact not managed by rules and logic as math is, and Benford’s law was finally proclaimed to be inaccurate by Iranian mathematicians. What we may observe in the overall of this mathematical exploration is that there is proof of Benford’s Law accuracy, as there is for its inaccuracy. Some data sets showed that Benford’s Law really does apply to all naturally occurring numbers, but there were some that didn ’t. What makes me go more towards the tendency to believe that Benford’s Law is right are the experiments previously made by mathematicians who found the same data sets that I used to prove the Law to be successful. All in all, there is more proof of it being accurate, but we still may not be 100% certain..

References: http://www.kirix.com/blog/2008/07/22/fun-and-fraud-detection-withbenfords-law/ http://ibmathsresources.com/2013/05/22/benfords-law-using-maths-to-catchfraudsters/ http://t1.physik.tudortmund.de/kierfeld/teaching/CompPhys_09/benford_iran_0906.2789v1.pdf  http://en.wikipedia.org/wiki/Benford's_law

“Digital Analysis using Benford’s Law” by Mark J. Nigrini http://www.benfords-law.com/ http://www.thisisthegreenroom.com/wordpress/wpcontent/uploads/2009/04/logs2.png http://www.khanacademy.org/math/trigonometry/exponential_and_logarithmi c_func/logarithmic-scale-patterns/v/logarithmic-scale

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF