Upload
paul-gibson
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
First Things First
Log Tables,
Random Numbers,
and Tax FraudAhbel, copyright 2001
The First Digit Phenomenon
“Everyday numbers obey a law so unexpected it is hard to believe it's true.”
New Scientist - 10 July 1999
Ahbel, copyright 2001
Ahbel, copyright 2001
Simon NewcombAstronomer and mathematicianLog tables1881 American Journal of Mathematics“The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally likely.”Newcomb’s ConjectureP=log10(1+1/d)
http://cepa.newschool.edu/het/profiles/newcomb.htm
Ahbel, copyright 2001
Graph of the Distribution
http://pass.maths.org.uk/issue9/features/benford/
%1.301
11log10
%6.49
11log10
%6.172
11log10
Ahbel, copyright 2001
Frank Benford
57 years later (1938) G.E. Physicist
p=log10(1+1/d) Proceedings of the American Philosophical Society20,229 observationsAreas of 335 rivers, American League baseball statistics, specific heats of 1389 chemical compounds, numbers appearing in reader’s digest articles and newspapers.
Ahbel, copyright 2001
Newspapers – Benford in 1938
1990 Census – Mark Nigrini
Dow Jones (1990-93) – Eduardo Ley
http://www.rexswain.com/benford.html
Ahbel, copyright 2001
An Intuitive Explanation
Plant 1 tree per year
Every tree grows 1 cm per year
What happens to the distribution of first digits of heights over time?
Ahbel, copyright 2001
Roger Pinkham
Rutgers 1961
Base invarianceScale invariance
Benford’s law is the only way!
htt
p:/
/ww
w.s
ean
et.
com
/~ks
bro
wn
/km
ath
30
2.h
tm
Ahbel, copyright 2001
Scale Invariant
Numbers seem to conform to a first digit phenomenon described by Benford’s Law
Benford’s Law seems to be Scale Invariant
If there is a first digit phenomenon that is scale invariant,
then multiplying numbers by a constant will not change the distribution of the first digits.
Ahbel, copyright 2001
We can express any number in scientific notation: x*10n, 1x<10
If the distribution of x is scale-invariant, then the distribution of y = log10x must be “translation invariant”:
log10ax = log10a + log10x = log10a + y
If multiplying x by a constant leaves the distribution of first digits unchangedthen adding a constant to y will leave the distribution of first digits unchanged
constant
Ahbel, copyright 2001
x is on 1 x < 10
y is on log101 y < log1010
y is on 0 y < 1
Since the only distribution that is “translation invariant” is the uniform distribution, y must be distributed uniformly over its interval, 0 y < 1. This is the probability distribution function y=1.
Ahbel, copyright 2001
P(d1=1) = P(1 x < 2)
= P(log101 y < log102)
= P( 0 y < log102)
2log
0
10
1dy = log102 - 0
log102 30.1%
y=log10x
1
0
0 1
P
log102
d1=
1
log
10 2
log
10 3
P
1
0
0 1
d1=
2
log103-log102 17.6%
Ahbel, copyright 2001
In general,
P(d1=d) = P( d x < d+1 )
= P(log10d y < log10(d+1)) 10
10
log 1
log
1d
d
dy
= log10 (d+1) - log10 d
10
d+1=log
d
10
1=log 1+
d
Ahbel, copyright 2001
Theodore Hill
Georgia Institute of Technology 1996All is outcome of processnormal, logarithmic, oscillating, etc. distributions“Now imagine grabbing random handfuls of data from a hotchpotch of such distributions. Hill proved that as you grab ever more of such numbers, the digits of these numbers will conform ever closer to a single, very specific law. This law is a kind of ultimate distribution, the “Distribution of Distributions.” And he showed that its mathematical form is…Benford’s Law.” New Scientist 10 July 1999
Ahbel, copyright 2001
http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf
Ahbel, copyright 2001
http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf
Ahbel, copyright 2001
Ahbel, copyright 2001
http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf
Ahbel, copyright 2001
Ahbel, copyright 2001
http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf
Ahbel, copyright 2001
Ahbel, copyright 2001
http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf
Ahbel, copyright 2001
Ahbel, copyright 2001
ondistributiy probabilit a is 9,...,3,2,1,d
11logd)P(D 101
d
1d
11log that show toable bemust We
9
110
9
11log...
3
11log
2
11log
1
11log 10101010
9
10 log...
3
4 log
2
3 log
1
2 log 10101010
9
10
8
9
7
8
6
7
5
6
4
5
3
4
2
3
1
2log10
11
10log10
Probability Distribution?
Ahbel, copyright 2001
9,...,3,2,1,d
11logd)P(D 101
d
First Digit Law
Second Digit Law 2,...,91, 0,d, d)(10k1logd)P(D
9
1k
1102
General Significant-Digit Law
k2,...,j},{0,1,...,9d all and
}{1,2,...,9d all k, intergers positive allFor
10d1log)dD,...,dP(D
j
1
1k
1i
iki10kk11
Ahbel, copyright 2001
Tax Fraud Detection
True Tax Data - Mark Nigrini’s 169,662 IRS model filesFraudulent Data - 1995 King’s County, New York, Cash disbursement and payroll743 freshmen’s responses to a request to write down a six-digit number at random
Ahbel, copyright 2001
Testing Mathematical Models
Suppose past census data follows Benford’s Law closely
A model that predicts the same census data should also follow Benford’s Law
http://www.math.gatech.edu/~hill/publications/cv.dir/1st-dig.pdf
Ahbel, copyright 2001
Computer Design
Donald Knuth, Stanford University
If the numbers computers process are not uniformly distributed then computers can be re-designed to:Minimize storage spaceMaximize calculation rate
http://www.math.gatech.edu/~hill/publications/cv.dir/1st-dig.pdf
Ahbel, copyright 2001
1973
43.5
10
19.4
15,584
17
“Everyday numbers obey a law so unexpected it is hard to believe it's true.”
New Scientist - 10 July 199939
9
88687
44
1826
52.4
14010
538
1905
1.837
98,817
17
795
20
29
16
36
32,000
28,341
476,140,375
31
26 2.7
467
1.53
55,592
47,901
10,000
1005.2
3,332
6
36
8.70
120
38
4500
3.6220
63,9000 642 7
11.68
276.5
33,283
15,724
1353.42
292.9
.0360