25
First Things First Log Tables, Random Numbers, and Tax Fraud A h b e l , c o p

First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Embed Size (px)

Citation preview

Page 1: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

First Things First

Log Tables,

Random Numbers,

and Tax FraudAhbel, copyright 2001

Page 2: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

The First Digit Phenomenon

“Everyday numbers obey a law so unexpected it is hard to believe it's true.”

New Scientist - 10 July 1999

Ahbel, copyright 2001

Page 3: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Simon NewcombAstronomer and mathematicianLog tables1881 American Journal of Mathematics“The law of probability of the occurrence of numbers is such that all mantissae of their logarithms are equally likely.”Newcomb’s ConjectureP=log10(1+1/d)

http://cepa.newschool.edu/het/profiles/newcomb.htm

Page 4: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Graph of the Distribution

http://pass.maths.org.uk/issue9/features/benford/

%1.301

11log10

%6.49

11log10

%6.172

11log10

Page 5: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Frank Benford

57 years later (1938) G.E. Physicist

p=log10(1+1/d) Proceedings of the American Philosophical Society20,229 observationsAreas of 335 rivers, American League baseball statistics, specific heats of 1389 chemical compounds, numbers appearing in reader’s digest articles and newspapers.

Page 6: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Newspapers – Benford in 1938

1990 Census – Mark Nigrini

Dow Jones (1990-93) – Eduardo Ley

http://www.rexswain.com/benford.html

Page 7: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

An Intuitive Explanation

Plant 1 tree per year

Every tree grows 1 cm per year

What happens to the distribution of first digits of heights over time?

Page 8: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Roger Pinkham

Rutgers 1961

Base invarianceScale invariance

Benford’s law is the only way!

htt

p:/

/ww

w.s

ean

et.

com

/~ks

bro

wn

/km

ath

30

2.h

tm

Page 9: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Scale Invariant

Numbers seem to conform to a first digit phenomenon described by Benford’s Law

Benford’s Law seems to be Scale Invariant

If there is a first digit phenomenon that is scale invariant,

then multiplying numbers by a constant will not change the distribution of the first digits.

Page 10: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

We can express any number in scientific notation: x*10n, 1x<10

If the distribution of x is scale-invariant, then the distribution of y = log10x must be “translation invariant”:

log10ax = log10a + log10x = log10a + y

If multiplying x by a constant leaves the distribution of first digits unchangedthen adding a constant to y will leave the distribution of first digits unchanged

constant

Page 11: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

x is on 1 x < 10

y is on log101 y < log1010

y is on 0 y < 1

Since the only distribution that is “translation invariant” is the uniform distribution, y must be distributed uniformly over its interval, 0 y < 1. This is the probability distribution function y=1.

Page 12: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

P(d1=1) = P(1 x < 2)

= P(log101 y < log102)

= P( 0 y < log102)

2log

0

10

1dy = log102 - 0

log102 30.1%

y=log10x

1

0

0 1

P

log102

d1=

1

log

10 2

log

10 3

P

1

0

0 1

d1=

2

log103-log102 17.6%

Page 13: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

In general,

P(d1=d) = P( d x < d+1 )

= P(log10d y < log10(d+1)) 10

10

log 1

log

1d

d

dy

= log10 (d+1) - log10 d

10

d+1=log

d

10

1=log 1+

d

Page 14: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Theodore Hill

Georgia Institute of Technology 1996All is outcome of processnormal, logarithmic, oscillating, etc. distributions“Now imagine grabbing random handfuls of data from a hotchpotch of such distributions. Hill proved that as you grab ever more of such numbers, the digits of these numbers will conform ever closer to a single, very specific law. This law is a kind of ultimate distribution, the “Distribution of Distributions.” And he showed that its mathematical form is…Benford’s Law.” New Scientist 10 July 1999

Page 15: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf

Page 16: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf

Ahbel, copyright 2001

Page 17: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf

Ahbel, copyright 2001

Page 18: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf

Ahbel, copyright 2001

Page 19: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

http://www.math.gatech.edu/~hill/publications/cv.dir/stat-der.pdf

Ahbel, copyright 2001

Page 20: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

ondistributiy probabilit a is 9,...,3,2,1,d

11logd)P(D 101

d

1d

11log that show toable bemust We

9

110

9

11log...

3

11log

2

11log

1

11log 10101010

9

10 log...

3

4 log

2

3 log

1

2 log 10101010

9

10

8

9

7

8

6

7

5

6

4

5

3

4

2

3

1

2log10

11

10log10

Probability Distribution?

Page 21: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

9,...,3,2,1,d

11logd)P(D 101

d

First Digit Law

Second Digit Law 2,...,91, 0,d, d)(10k1logd)P(D

9

1k

1102

General Significant-Digit Law

k2,...,j},{0,1,...,9d all and

}{1,2,...,9d all k, intergers positive allFor

10d1log)dD,...,dP(D

j

1

1k

1i

iki10kk11

Page 22: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Tax Fraud Detection

True Tax Data - Mark Nigrini’s 169,662 IRS model filesFraudulent Data - 1995 King’s County, New York, Cash disbursement and payroll743 freshmen’s responses to a request to write down a six-digit number at random

Page 23: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Testing Mathematical Models

Suppose past census data follows Benford’s Law closely

A model that predicts the same census data should also follow Benford’s Law

http://www.math.gatech.edu/~hill/publications/cv.dir/1st-dig.pdf

Page 24: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

Computer Design

Donald Knuth, Stanford University

If the numbers computers process are not uniformly distributed then computers can be re-designed to:Minimize storage spaceMaximize calculation rate

http://www.math.gatech.edu/~hill/publications/cv.dir/1st-dig.pdf

Page 25: First Things First Log Tables, Random Numbers, and Tax Fraud Ahbel, copyright 2001

Ahbel, copyright 2001

1973

43.5

10

19.4

15,584

17

“Everyday numbers obey a law so unexpected it is hard to believe it's true.”

New Scientist - 10 July 199939

9

88687

44

1826

52.4

14010

538

1905

1.837

98,817

17

795

20

29

16

36

32,000

28,341

476,140,375

31

26 2.7

467

1.53

55,592

47,901

10,000

1005.2

3,332

6

36

8.70

120

38

4500

3.6220

63,9000 642 7

11.68

276.5

33,283

15,724

1353.42

292.9

.0360