The Law of Anomalous Numbers Author(s): Frank Benford Source: Proceedings of the American Philosophical Society, Vol. 78, No. 4 (Mar. 31, 1938), pp. 551-572 Published by: American Philosophical Society Stable URL: http://www.jstor.org/stable/984802 . Accessed: 08/02/2014 11:22 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp
. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
.
American Philosophical Society is collaborating with JSTOR to digitize, preserve and extend access to Proceedings of the American Philosophical Society.
http://www.jstor.org
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS FRANK BENFORD GeneralElectricCompany, Physicist,ResearchLaboratory, Schenectady, New York (Introducedby IrvingLangmuir) (Read April22, 1937) ABSTRACT
It has been observedthat the firstpages of a table of commonlogarithms thatmoreusednumbersbegin showmorewearthando thelast pages,indicating withthedigit1 thanwiththe digit9. A compilation ofsome20,000firstdigits sourcesshowsthatthereis a logarithmic takenfromwidelydivergent distribution offirstdigitswhenthenumbersare composedoffourormoredigits. An analysis sourcesshowsthatthe numberstakenfromunreofthe numbersfromdifferent latedsubjects,suchas a groupofnewspaper items,showa muchbetteragreement witha logarithmic distribution thando numbersfrommathematical tabulations or otherformaldata. There is herethe peculiarfactthat numbersthat individuallyare withoutrelationship are, whenconsideredin largegroups,in good witha distribution law-hence the name " AnomalousNumbers." agreement A further analysisofthedata showsa strongtendency forbodiesofnumerical data to fallintogeometric series. If theseriesis madeup ofnumberscontaining threeormoredigitsthefirstdigitsforma logarithmic series. If thenumbersconrelationstillholdsbutthesimplelogarithmic tainonlysingledigitsthegeometric relationno longerapplies. An equationis givenshowingthe frequencies of firstdigitsin the different ordersofnumbers1 to 10, 10 to 100,etc. The equationalso givesthefrequency ofdigitsin thesecond,third -place ofa multi-digit and it is shownthatthesamelaw appliesto reciprocals. number, Thereare manyinstancesshowingthatthegeometric series,or thelogarithmiclaw,has longbeenrecognized as a commonphenomenon in factualliterature and intheordinary affairs oflife. The wiregaugeand drillgaugeofthemechanic, the magnitudescale of the astronomer and the sensoryresponsecurvesof the are all particularexamplesofa relationship psychologist thatseemsto extendto all humanaffairs. The Law ofAnomalousNumbersis thusa generalprobability law ofwidespreadapplication. PART I: STATISTICAL DERIVATION
OF THE LAW
has been observedthat the pages of a much used table of common logarithmsshow evidences of a selective use of the natural numbers. The pages containingthe logarithms of the low numbers1 and 2 are apt to be more stained and frayedby use than those of the highernumbers8 and 9. Of IT
PROCEEDINGS VOL. 78, NO.
OF THE
AMERICAN
4, MARCH 1938
PHILOSOPHICAL
SOCIETY,
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
551
552
FRANK BENFORD
course,no one could be expected to be greatlyinterestedin the conditionof a table of logarithms,but the mattermay be consideredmoreworthyofstudywhenwe recallthat the table and is used in the building up of our scientific,engineering, general factual literature. There may be, in the relative cleanlinessof the pages of a logarithmtable, data on how we thinkand how we react when dealing withthingsthat can be describedby means of numbers. Methodsand Terms Beforepresentingthe data collectedwhileinvestigatingthe possibleexistenceof a distributionlaw that applies to numerand to randomdata in particular,it may ical data in general-, be wellto definea fewtermsand outlinethe methodofattack. First, a distinctionis made betweena digit,whichis one of the nine natural numbers 1, 2, 3, ... 9, and a number, whichis composedof one or moredigits,and whichmay contain a 0 as a digitin any positionafterthe first. The method ofstudyconsistsofselectingany tabulationofdata that is not too restrictedin numericalrange,or conditionedin some way too sharply,and makinga count of the numberof times the natural numbers 1, 2, 3, ... 9 occur as firstdigits. If a decimalpoint or zero occursbeforethe firstnaturalnumberit is ignored,forno attentionis to be paid to magnitudeother than that indicatedby the firstdigit. The Law of Large Numbers An effortwas made to collect data fromas many fieldsas types. possible and to include a variety of widely different no numbers that have random from purely The types range relationotherthan appearing withinthe covers of the same magazine, to formalmathematicaltabulations that admit of no variationfromfixedlaws. Between these limitsone will recognizevarious degrees of randomness,and in general the title of each line of data in Table I will suggestthe nature of the source. In every group the count was continuousfrom the beginningto the end, or in the case oflong tabulations,to numberof observationsto insure a fair average. a sufficient
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
553
THE LAW OF ANOMALOUS NUMBERS
The numberscountedin each groupis givenin the last column of Table I. TABLE I PERCENTAGE OF TIMES THE DIGITS IN NUMBERS,
NATURAL NUMBERS AS DETERMINED BY
1 TO 9 ARE USED AS 20,229OBSERVATIONS
FIRST
First Digit T itle
_ _
1
2
_ _
3
_ _
4
_
__ - _ _ _ - _
5
6
7.2
8.6
E Spec.Heat
24.0 18.4 16.2 14.6 10.6
4.1
H Mol. Wgt.
26.7 25.2 15.4 10.8
5.1
A Rivers,Area 31.0 16.4 10.7 11.3 20.4 14.4 18.0
14.2 4.8 12.0
8.1 8.6 10.0
7.2 10.6 8.0
29.6 30.0
18.3 18.4
12.8 11.9
9.8 10.8
8.3 8.1
I Drainage 27.1 J AtomicWgt. 47.2 K n-, i/n,*.**25.7 L Design 26.8
23.9 18.7 20.3 14.8
13.8 5.5 9.7 14.3
12.6 4.4 6.8 7.5
8.2 6.6 8.3
5.0 4.4 6.8 8.4
32.4 27.9 32.7 31.0 28.9
18.8 17.5 17.6 17.3 19.2
10.1 14.4 12.6 14.1 12.6
10.1 9.0 9.8 8.7 8.8
9.8 8.1 7.4 6.6 8.5
5.5 7.4 6.4 7.0 6.4
27.0
18.6
15.7
9.4
B C D
Population 33.9 41.3 Constants Newspapers 30.0
F Pressure G H.P. Lost
M Digest
N Cost Data 0 X-RayVolts P Am. League Q Black Body R Addresses S n',n2... n! T Death Rate Average.
33.4 18.5 12.4
7.5
25.3 16.0 12.0 10.0
. . . . . .30.6
Probable Error 4-0.8
18.5 +0.4
6.7 6.6 7.1
8.5
6.7
_
_ - _ _ _
7
5.5
9
4.2
5.1
6.2 5.8 6.0
4.1 1.0 6.0
3.7 2.9 5.0
6.4 7.0
5.7 5.1
4.4 5.1
6.5
8.8 6.5
3.2
4.1 5.0 3.3 7.2 7.0
5.5 4.7 5.1 4.9 5.2 5.6
6.8 7.2
-~ount
8
4.8 2.8 2.5 4.4
8.0 7.3
4.9 5.5 5.8 5.6 4.7 5.0
7.1
4.8
C
335
2.2 3259 10.6 104 5.0 100
4.1 1389 4.7 3.6
703 690
1.9 5.5
159 91
3.2 1800 8.9 5000 5.6
4.2 3.1 4.8 3.0
560
308
5.0
741 707 1458 1165 342
4.1
418
54
5.5
12.4 9.4 8.0 6.4 5.1 4.9 4.7 -0.4 +0.3 +-0.2 +-0.2 +-0.2 +-0.2 +0.3
900
1011
At the foot of each column of Table I the average percentage is given for each firstdigit, and also the probable errorof the average. These averages can be better studied if the decimal point is moved two places to the left,making the sum of all the averages unity. The frequencyof firstl's is then seen to be 0.306, whichis about equal to the common logarithmof 2. The frequencyof first2's is 0.185, which is slightlygreater than the logarithmof 3/2. The difference here,log 3 - log 2, is called the logarithmicintegral. These resemblancespersistthroughout,and finallythereis 0.047 to be comparedwithlog 10/9,or 0.046.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
554
FRANK BENFORD
The frequency of first digits thus follows closely the logarithmicrelation Fa
= log (a +
(1)
),
whereFa is the frequencyof the digit a in the firstplace of used numbers. TABLE II
OBSERVED AND COMPUTED FREQUENCIES Natural Number
1
2 3 4 5 6 7 8 9
Number Interval
1 to 2
2 to 3 3 to 4 4 to 5 5 to 6 6 to 7 7 to 8 8 to 9 9 to 10
Observed Frequency
0.306
0.185 0.124 0.094 0.080 0.064 0.051 0.049 0.047
Logarithm Interval
0.301
0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
Observed - Computed
+0.005
+0.009 -0.001 -0.003 +0.001 -0.003 -0.007 -0.002 +0.001
Prob. Error of Mean
?t0.008
?40.004 -0.004 ?40.003 ?40.002 ?t0.002 4t0.002 ?t0.002 ?t0.003
There is a qualification to be noted immediately,for Table I was compiledfromnumberscomposed in general of four,five-and six digits. It will be shown later that Eq. (1) is a distributionlaw for largenumbers,and there is a more general equation that applies when consideringnumbersof one, two significantdigits. If we may assume the accuracyof Eq. (1), we thenhave a probabilitylaw ofthe mostgeneralnature,forit is a probability derived from"events" throughthe medium of theirdescriptivenumbers;it is not a law of numbersin themselves. The range of subjects studied and tabulated was as wide as timeand energypermitted;and as no definiteexceptionshave ever been observed among true variables, the logarithmic law forlarge numbersevidentlygoes deeper among the roots ofprimalcauses than our numbersystemunaided can explain. FrequencyofDigits in theqthPosition The second-place digits are ten in number,for here we musttake 0 into account. Also, in consideringthe frequency
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
555
of a second-place digit b we must take into account the digit a that preceded it. The logarithmicinterval between to two digitsis now to be dividedinto ten parts corresponding ... of 9. Let a be the firstdigit a numthe ten digits0, 1, 2, ber and b be the second digit;thenusingthe customarymeaning of position and order in our decimal system a two-digit numberis writtenab, and the next greaternumberis written ab + 1. The logarithmic interval between ab and ab + 1 is log (ab + 1) - log ab, while the interval covered by the ten possible second-place digits is log (a + 1) - log a. Therefore the frequencyFb of a second-place digit b followinga first-placedigit a is Fb
= log F Fb= )/l1 Og ( ab?+ ab ,1og
+ a
(2)
As an example,the probabilityFb of a 0 followinga first-place 5 in a randomnumberis the quotient Fb = log0/
log .
It followsthat the probabilityfora digitin the qthpositionis ... p (q +1) 1 lgabc abc ... pq Fb = -3abc o (p+1)) log abc... oIp .. abc
p
Here the frequencyof q depends upon all the digits that precede it, but when all possible combinationsof these digits are takeninto accountFq approachesequalityforall the digits 0, 1, 2, *.. 9, or Fq
0.1.
(4)
As a resultof this approach to uniformity in the qth place the distributionof digitsin all places in an extensivetabulation of multi-digitnumberswill be also nearlyuniform.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
FRANK BENFORD
556
TABLE III FREQUENCY
OF DIGITS
IN FIRST
AND SECOND
PLACES
Digit
First Place
Second Place
0.
0.000 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
0.120 0.114 0.108 0.104 0.100 0.097 0.093 0.090 0.088 0.085
1. 2. 3. 4. 5. 6. 7... 8.... 9.
Reciprocals Some tabulations of engineeringand scientificdata are given in reciprocalform,such as candles per watt, and watts per candle. If one formof tabulation followsa logarithmic distribution,then the reciprocaltabulation will also have the same distribution. A little considerationwill show that this must followfordividingunityby a given set of numbersby means of logarithmsleads to identicallogarithmswithmerely a negativesign prefixed. The Law ofAnomalousNumbers A study of the itemsof Table I shows a distincttendency forthose of a randomnatureto agreebetterwiththe logarithmic law than those of a formalor mathematicalnature. The best agreementwas foundin the arabic numbers(not spelled out) of consecutivefrontpage news items of a newspaper. Dates were barred as not being variable, and the omissionof spelled-outnumbersrestrictedthe counted digitsto numbers 10 and over. The first342 streetaddressesgiven in the currentAmericanMen ofScience (Item R, Table IV) gave excellent agreement,and a complete count (except for dates and page numbers)of an issue of the Readers' Digest was also in agreement. On the other hand, the greatest variations from the logarithmicrelation were found in the firstdigits of mathe-
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
557
maticaltables fromengineering handbooks,and in tabulations ofsuch closelyknitdata as Molecular Weights,SpecificHeats, Physical Constantsand AtomicWeights. TABLE IV SUMMATION
OF DIFFERENCES
BETWEEN FREQUENCIES
OBSERVED
Nature
1 2 3 4 5 6 7 8 9 10
D F G R P Q 0 M A T
NewspaperItems 2.8 PressureLost,AirFlow 3.2 H.P. Lost in AirFlow 4.8 StreetAddresses, A.M.S. 5.4 Am. League,1936 6.6 Black Body Radiation 7.2 X-Ray Voltage 7.4 Readers'Digest 8.4 AreaRivers 9.8 Death Rates 11.2
AND THEORETICAL
Nature
11 N Cost Data, Concrete 12.4 12 S n.... n8,n! 13.8 13 L DesignData Generators 16.6 14 B Population,U. S. A. 16.6 15 I DrainageRate ofRivers 21.6 16 K n-1,-Vfn .*22.8 17 H MolecularWgts. 23.2 18 E SpecificHeats 24.2 19 C PhysicalConstants 34.9 20 J AtomicWeights 35.4
These factslead to the conclusionthat the logarithmiclaw applies particularlyto those outlaw numbersthat are without known relationshiprather than to those that individually followan orderlycourse;and therefore the logarithmicrelation is essentiallya Law of Anomalous Numbers. PART
II: GEOMETRIC BASIS
OF THE LAW
The data so farconsideredhave been composedentirelyof used numbers;that is, numbersas they are used in everyday affairs. There must be some underlyingcauses that distort what we call the "natural" numbersysteminto a logarithmic distribution,and perhaps we can best get at these causes by firstexaminingbrieflythe frequencyof the natural numbers themselves when arranged in the infinitearithmeticseries 1, 2, 3, ... n, wheren is as large as any numberencountered in use. Let us assume that each individualnumberin the natural numbersystemup to n is used exactlyas oftenas everyother individual number. Starting with 1, and counting up to
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
558
FRANK BENFORD
10,000,forexample, 1 would have been used 1,112 times,or 11.12 per cent of all uses. If the count is extendedto 19,999 thereare 9,999 l's added, and firstl's occur in 55.55 per cent of the 19,999 numbers. When number 20,000 is reached there is a temporarystopping of the addition of first l's and 90,000 of the other digits are added to the series before FRfEQfJENVCY Or P/,Sr PLACC D/G/r3 0 OBSERVED
0.30
/ 2
3 4
a
5
6
7 8 3
formulti-digit FIG. 1. Comparisonofobservedand computedfrequencies numbers.
l's are again broughtintothe series,at100,000. At thispoint the percentageof l's is again reduced to 11.112 per cent as illustratedin curve A of Fig. 2. This curve is Fn and log n scale. If the equations forA plotted to a semi-logarithmic are writtenforthe threediscontinuousbut connectedsections 10,000-20,000, 20,000-99,999 and 99,999-100,000 the area underthe curve will be veryclosely0.30103, wherethe entire area of the frameof coordinateshas an area 1. But an integrationby the methodsof the calculus is merelya quick way of adding up an infinitenumberof equallyspaced ordinatesto the curve and fromthis additionfindingthe average heightof
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
559
the ordinatesand hence the area underthe curve. But if we are satisfiedwitha resultsomewhatshortof the perfectionof the integralcalculus we may take a finitenumberof equally spaced ordinatesand by plain arithmeticcome to practically the same answer. By definitioneach point of A represents
LINEAR FREqU/NC/ES
FROM /Q000TO/0,000 1 2 3 4 5
67
89
/0
ArFOR/ 8FOR 9
0.4 0O3
0
NA7VRAA NUMBER FIG. 2. Linear frequencies of the naturalnumbersystembetween10,000and
100,000.
the frequencyof firstl's from1 up to that point,and an integration (by calculus or arithmetic)under curve A gives the averagefrequencyoffirstl's up to 100,000. The finitenumber correspondingto equally spaced ordinatesnow representsa geometricseries of numbersfrom10,000 to 100,000,and it is substantiallythisseriesofnumbers,in thisand otherordersof
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
560
FRANK BENFORD
the natural numberscale that lead to the,numericalfrequencies already presented. Curve B of Fig. 2 is for9 as a firstdigit. The frequency of9's decreasesin the numberrangefrom10,000to 89,999 and then increasesas 9's are added from90,000 to 99,999, and an integrationundercurve B leads to a good numericalapproximationto the logarithmicintervallog 10 - log 9, as called for by the previousstatisticalstudy. and LogarithmicSeries Geometric The close relationshipof a geometricseriesand a logarithmicseriesis easilyseenand hardlyneedsformaldemonstration. The uniformlyspaced ordinates of Fig. 2 forma geometric series of numbersfor these numbershave a constant factor betweenadjacent terms,and thisconstantfactoris determined in size by the constantlogarithmicincrement. Semi-LogCurves A geometricseriesofnumbersplottedto a semi-logarithmic scale gives a straightline. In the originaltabulation of observed numbersthe line of data marked "R" is designated simplyas "street addresses." These are the streetaddresses ofthe first342 people mentionedin the currentAmericanMen ofScience. The randomnessof such a list is hardlyto be disputed, and it should thereforebe usefulforillustrativepurposes. In Fig. 3 these addresses are firstindicatedby the height of the lines at the base of the diagram. The heightof a line, measured on the scale at the left,indicates the numberof addresses at, or near, that streetnumber. Thus therewere fiveaddressesat No. 29 on various streets. In orderto make the trendclearer,the heightsof these lines were summed,beginningat the leftand proceedingacross to the right. It was found that four straightlines could be drawn among these summationpoints with fair fidelityof trend,and these four lines representfour geometricseries, each with a different factorbetweenterms. Each line will give the observedfre-
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
~~D/UTRl8ur/ON AND 3UMMA7YON OF F/Rsr 34 2
-
5IMER/CAN MEN OF SCIENCE" -
I
WIll
-
~
-
/9-34 -
~
~
~
-
~
~
/0
/0
___t_Q
_08_C
FIG. 3. Distribution and summation offirst342streetaddresses,American M
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
562
FRANK BENFORD
quency over the numericalrange it covers,and hence satisfies the logarithmicrelationship. The Natural Numbersand Nature'sNumbers In natural events and in events of which man considers himselfan originatorthereare plentyofexamplesofgeometric orlogarithmicprogressions. We are so accustomedto labeling things1, 2, 3, 4, *** and thensayingtheyare in naturalorder that the idea of 1, 2, 4, 8, * being a more natural arrangement is not easily accepted. Yet it is in this latter manner large numberofphenomenaoccur,and the that a surprisingly evidenceforthis is available to everyone. First, let us considerthe physiologicaland psychological reactionto externalstimuli. The growthof the sensationof brightnesswith increasing illumination is a logarithmic function, as illustrated by Fechner's Law. The growth of sensation is slow at first whilethe rodsofthe retinaare alone responsive,and a straight line on semi-logarithmicpaper (the stimulus being on the funclogarithmicscale) can representthe intensity-brightness tion in this region. When the cones come into action there is a sharp change in the rate of growth,and anotherstraight line representsour workingrangeof vision. When over-excitationand fatigueset in, a thirdline is needed; and thus three geometricseries could be used to state the relationbetween illuminationand the sensation of brightness. If the literathe brightness ture contained sufficientnumericalreferences, functionshould give an extremelyclose approximationof the logarithmiclaw of distribution. The sense of loudness followsthe same rules, as does the sense of weight;and perhaps the same laws operate to make at ages ten and fifty. the senseofelapsed timeseemso different Our music scales are irregulargeometricseriesthat repeat rigidlyeveryoctave. In the fieldof medicine,the responseof the body to medicine or radiationis oftenlogarithmic,as are the killingcurves undertoxinsand radiation. .
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUAIBERS
563
In the mechanical arts, where standard sizes have arisen fromyears of practical experience,the finalresultsare often geometricseries,as witness our standards of wire diameters and drillsizes, and the issued lists of "preferrednumbers." The astronomerlists stars on a geometricbrightnessscale that multipliesby 100 every five steps and the illuminating engineeradopts the same typeofseriesin choosingthewattage of incandescentlamps. In the field of experimentalatomic physics, where the results representwhat occurs among groups of the building units of nature, and where the unit itselfis known only by mass action,the test data are statisticalaverages. The action of a single atom or electronis a random and unpredictable event; and a statistical average of a group of such events would show a statisticalrelationshipto the resultsand laws here presented. That this is so is evidencedby the frequent use made of semi-logpaper in plottingthe test data, and the test points often fall on one or more straightlines. The analogy is complete,and one is temptedto thinkthat the 1, 2, 3, *.. scale is not the natural scale; but that, invokingthe base e of the natural logarithms,Nature counts e0,
ex,
e2x
e3x
...
and builds and functionsaccordingly. PART
III.
DIGITAL
ORDERS OF NUMBERS
The natural number system is an array of numbers in simple arithmeticseries,but on top of this we have imposed an idea taken froma geometricseries. Numbers composed of many digits are ordinarilyseparated into groups of three digitsby interposingcommas,and here we unknowinglygive evidence of the use of these numberson a geometricscale. For convenienceofdescriptionthe naturalnumbers1 to 10 are called the firstdigitalordernumbers,thosefrom10 to 100 the second digitalorder,etc. It will be noted that 10 is both the last numberof the firstorderand the firstnumberof the second order,and when an integrationis carriedout, as will
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
FRANK BENFORD
564
be done later, 10 appears as both an upper and a lowerlimit, and it is thus used in this case as a boundaryline ratherthan a unit zone in the natural numbersystem. In Fig. 4 the curves show the frequencywith which the naturalnumbersoccurin the Natural NumberSystem,beginningat the leftedge,where1 is the onlynumber,its frequency is 1; that is, until a second numberis added 1 is the entire numbersystem. When2 is reachedthefrequencyis 0.50 for1 /INEAIR FREQUEAW/E5
am
44
A feL
r DwX w _ _ r
X I1 I
4
OF rHE
~~~~~JECOND
/Gr17L
V0
1.2% -Vtl=SW XM
42C~~~~~~~
NATURAL
ORDER-O_
0
mum8ERS
/ ro
/,0O
7-I/RD / OeRDCR DTrAL
41
rr/2
11;1
;SS
FIG. 4. Linear frequenciesof the natural numbers in the firstthree orders.
foreachofthe andO.S0for2. AtS,forexample,thefrequency until9 is firstS digitsis 0.20,and theequal divisioncontinues reached. At 10, the digit1 has'appearedtwiceand has a frequencyof 0.20 against0.10 foreach of the othereight digitsthathave appearedbut once. It willbe observedthatthecurverisingfrom9 on thescale of abscissoeis foronlythedigit1, whilethecurvecontinuing from9 is forthedigits2 to 9 inclusive. At 19 the downward for2 risesto join thecurvefor1 at 29 and 1 curve frequency
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
565
and 2 have a commoncurve until 99 is reached and a third first1 is about to be added to the series. At any ordinatethe curves thereforetell the frequencyof the total number of natural numbersup to that point.
j
II I
I,
9
/0
FIG. 5. Continuous and discontinuousfunctionsin the neighborhoodof the digit 9.
The curvesare drawnas ifwe weredealingwithcontinuous functionsin place of a discontinuousnumbersystem. The is that the thingswe justificationforusing a continuousformuse the numbersystem to representare nearly always perfectlycontinuousfunctions,and the number,say 9, given to any phenomenonwillbe used in some degreeforall the infinite sizes of phenomenabetween 8 and 10 when we confineourselves to singledigitnumbers.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
FRANK BENFORD
566
An enlargedsketch of the linear frequencycurves at the junctionofthe firstand second ordersis givenin Fig. 5. The lines h-b and b-j are the computedratios of 1 in this region, whilethe lines 8-b forthe ratio of 9 beginsat 8, foras soon as size 8 is passed thereis a possibilityof our usinga 9, whilefor rFIW /EUNCY OF S/NGLE D/G/I3 / T09 + rHEORE7/CAL O OgSERvVED FREQUENCY Or FOOTNOFfS /N /o BOOKS EACH H/AVING Ar L.EAsr ONE PAGE WIrH TEN FoorWores O&RV&ED) (2,968 0.50
l
__
_
0.40
0. /O
-
/
2
3
4
5
6
7 8a9
ofsingledigits. FIG. 6. Theoreticaland observedfrequencies
size 812 the chancesare about equal forcallingit either8 or 9. The summationof area underthe curve 8-b-c is taken as the probabilityofusinga 9 forphenomenain this region. This is about equivalentto knowingaccuratelythe size ofall phenomena in this regionand decidingto call everythingbetween8.5
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
567
and 9.5 by the number9. Once 9 is passed the curve for 1, b-j, beginsto risein anticipationof the phenomenabetween9 and 10 that will be called 10. It has been notedthat forhighordersofnumbersthe areas underthe curvesof Fig. 2 are proportionalto the frequencyof use of the firstdigit. The same demonstrationwill now be made withthe aid ofthe calculus in regionsthat are markedly discontinuous. Selectingthe thirddigitalorder,Fig. 4, the area underthe 1-curvecan be written *199
A1"'
-
00
lOQO
999
yddx +j
Y2d +
19
99
3 dx,
(5)
wherethe ordinatesof the firstrisingsectionof the curve are
a
Yi
-
a
88
(6)
The descendingsectionof the curve has ordinates 111
Y2 =
(7)
and the last risingsectionbetween999 and 1,000has ordinates a
-
a-
888
(8)
The curvesare plottedto semi-logarithmic coordinatesaDd x = log a,
(9)
dx = dala.
(10)
The integralsaftermakingthese substitutionsgive the value 1990 A1"'=
loe
99
8 1000'
A similar operation yields for the 1-curve in the second digital order 8 A1= og. A1" g190 + 100 9
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
FRANK BENFORD
568
and in the firstorder 8
10
Al' = loge-jj+ -?jj-
From the symmetryrunningthroughthese solutionsand fromthe solutionsforthe eightotherfirstdigits,we can write the generalequation forthe Law of AnomalousNumbers
F where
r =
1) +or 8 =[log 10 (2.10r1 r _1 loge
Fa
1
1) 10-
oge [lge (a
tN11
lori
)10r -1
~~a+
1N
-
_
whereN = log, 10 is thefactorto convertthe expressionsfrom the naturallogarithmsystem,base e, to the commonlogarithm system,base 10. done If highordersof r are considered,as was unwittingly in the originalstatisticalwork,these expressionssimplifyby droppingthe terms - 1 in both numeratorand denominator, and the numericaltermshaving lor in the denominatorbecome negligible. Hence the generalequations become 2
Fr = =log0lo1, Far
a$l
= log
a
(12) ,
(13)
in form, but these two expressionsno longerhave a difference and -theymay be mergedinto Far =
+ log1o a
(14)
whichwas the relationshiporiginallyobservedformulti-digit numbers. In Table V numericalvalues are given forthe theoretical frequenciesof used numbersfor the first,second, third and limitingdigitalorders.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
569
TABLE V THEORETICAL
FREQUENCIES
IN VARIOUS
DIGITAL
ORDERS
First Digit
First Order
Second Order
Third Order
1 2 3 4 5 6 7 8 9
0.39319 0.25760 0.13266 0.08152 0.05348 0.03575 0.02352 0.01456 0.00772
1 tO 10
10 tO 100
100 tO 1000
0.31786 0.17930 0.12432 0.09479 0.07631 0.06366 0.05444 0.04742 0.04190
0.30276 0.17638 0.12487 0.09669 0.07889 0.06662 0.05764 0.05078 0.04537
Limiting Order
0.30103 0.17609 0.12494 0.09691 0.07918 0.06695 0.05799 0.05115 0.04576
The frequenciesofthe singledigits1 to 9 varyenoughfrom the frequenciesof the limitingorderto allow a statisticaltest if a source of digitsused singlycan be found. The footnotes so commonlyused in technical literatureare an excellent source, consistingof units that are indicated by numbers, lettersor symbols. The procedureofcollectingdata forthe first-order numbers was to make a cursoryexaminationof a volume to see if it contained as many as 10 footnotesto a page, forobviously no test of the range1 to 9 could be made if the maximum number fell short of the full range. The numbershere recorded in Table VI are the numberof footnotesobservedon consecutivepages, beginningon page 1 and continuingto the end of the book, or until it seemed that a fairsample of the book had been obtained. The books used were the Standard Handbook for Electrical Engineers, Smithsonian Physical Tables, Handbuchder Physik and Glazebrook's Dictionaryof Applied Physics. In Table VI the observedpercentagesof singledigits1 to 9 are givenalongwiththenumberofpages used in each volume and the numberof footnotesobserved. The frequencyfor1 is seen to be 43.2 per cent as against the theoreticalfrequency of39.3 per cent,and forthe digit9 the observationsagreewith theorywith Fg' = 0.8 per cent.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
570
FRANK BENFORD
In general the agreementwith theoryis as good as the computedprobable errorsof the observation. TABLE VI COUNT OF FOOTNOTES 3
2
1 ~~Pages __ Used
Volume Volume
_
_
_
_
5
4 _
_
_
_
6 _
_
8
7 _
_
_
_
9 _
_
_
Frequencies, in Per Cent
Total Count
1. S. H. E. E. 2. Sm. Phy. Ta .......
All All
55.1 56.3
22.7 22.1
12.3 6.6
5.0 6.1
2.4 5.0
1.7 2.2
0.3 1.1
0.3 0.6
0.3 0.0
586 181
H. der Phy.. H. der Phy.. H. der Phy...
365
37.2 29.7 19.5
25.7 26.6 17.4
12.1 14.6 17.7
9.5 11.0 11.9
4.8 8.0 11.3
5.2 5.9 9.2
2.6 1.8 6.1
2.2 1.0 5.8
0.9 1.4 1.1
230 287 293
3. 4. 5. 6. 7. 8. 9. 10.
II. derPhy..
.
360 360
H. derPhy... H. derPhy... GlazebrookI ...... GlazebrookV .....
361 360 360 All All
ObservedAve.43.2 PredictedAve.39.3 Difference. Probable Error.
8.5
52.8 23.6 33.0 56.8 49.5 41.7
5.5
27.5 11.8 10.7 23.2 6.7 7.6 22.3 13.7 6.9 25.2 13.4 9.1
23.6 25.7 +3.9 -2.1 3.0 40.6
4.0 4.3 2.4 2.3 4.7
3.2
0.8
5.9 2.8 1.4- 0.5 1.5 1.5 3.2 1.7
0.0 2.4 1.4 1.5 0.5
1.6
1.6 0.0 0.8 0.5
11.8 8.3 4.9 3.9 1.9 1.6 0.8 5.3 3.6 2.4 13.3 8.1 1.5 0.8 -1.5 +0.2 -0.4 +0.3 -0.5 +0.1 0.0 40.7 ?0.5 ?0.6 ?0.5 ?0.4 ?0.4 ?0.4
127
254 211 394 405
2968
SummationofFrequencies One ofthe conditionsthatmustbe metby theseexpressions forthe frequenciesof the integersis that, in any one order, the sum of the frequenciesmust equal unity;that is, the sum of theirprobabilitiesmust equal certainty. Selectingthe first-order digits,Eq. 11, and remembering the logarithmicrule that the sum of the logarithmsof a group of numbersis equal to the logarithmof theircombinedproducts, we have the probabilityP' P
1023456789
= logio 9102-345678
rs
1-010
1
1
1
10 1010
1
1
10
1
10
1
10
1 1
10 N'
whichreducesto Pt
=
log1010 + 0
=1.
In a similarmanner fromthe complete set of equations
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
THE LAW OF ANOMALOUS NUMBERS
571
indicatedby Eq. 11 we have lo 190 29-3949-59-6979 89-99 1 g10 99 19 29 39-4959-69-79-89 1 1 1
P=
+I 10
=
=1
100
10 10100
-
1
1
100
100
1
-
100
1 1 100 N
log1010 + 0
and similarproofcan be workedout forthe otherorders. SummaryofPart III Single digits, regardlessof their relation to the decimal point and also regardlessof precedingor followingzeros,have a specific natural frequencythat varies sharply from the logarithmicratios. The second digital order,which is composed of two adjacent significantdigits, has a specificfrequency approximatingthe logarithmicfrequency; and for three or more associated digitsthe variation fromthe latter frequencywould be extremelydifficult to findstatistically. The basic operation F=f F
fda a
or F_
_a a
in convertingfromthe linearfrequencyofthe naturalnumbers to the logarithmicfrequencyofnaturalphenomenaand human events can be interpretedas meaningthat, on the average, these things proceed on a logarithmicor geometricscale. Anotherway of interpreting this relationis to say that small thingsare more numerousthan large things,and there is a tendencyfor the step between sizes to be equal to a fixed fractionofthe last precedingphenomenonor event. There is
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions
572
FRANK BENFORD
no necessityor implicationof limits at eitherthe upper or the lower regionsof the series. If the view is accepted that phenomenafallinto geometric series,then it followsthat the observedlogarithmicrelationship is not a result of the particularnumericalsystem,with its base, 10, that we have elected to use. Any other base, such as 8, or 12, or 20, to selectsome ofthe numbersthat have been suggestedat various times, would lead to similarrelationships; for the logarithmicscales of the new numerical systemwouldbe coveredby equally spaced stepsby the march ofnaturalevents. As has been pointedout before,the theory of anomalous numbersis reallythe theoryof phenomenaand events, and the numbersbut play the poor part of lifeless symbolsforlivingthings.
This content downloaded from 184.174.224.243 on Sat, 8 Feb 2014 11:22:35 AM All use subject to JSTOR Terms and Conditions