51
Lotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore 560095, India May 27, 2017 Tony C. Scott Lotka’s Law

Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Lotka’s Law (and Other Power Laws)

T.C. Scott

presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road,Koramangala, Bangalore 560095, India

May 27, 2017

Tony C. Scott Lotka’s Law

Page 2: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Outline of Presentation: Explanation through Examples

1. Definitions

2. Power Law: Underlying mechanism behind Pareto Principle

3. Two examples of publications.

4. Artwork

5. Other Power laws. E.g. Zipf’s law.5.1 word frequency5.2 company size5.3 Investment5.4 Gutenberg-Richter law: seismology5.5 Moore’s Law: exponential growth.5.6 death statistics5.7 Population vs. e.g. city ranking5.8 general cases

6. Artifacts/Comments

7. Conclusions

Tony C. Scott Lotka’s Law

Page 3: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Lotka’s Law

I Alfred J. Lotka (1926). Statistician, physicist.

I Author publication frequency in any given field.

I Number of authors writing X papers is ∝ 1/Xn.

I n ≈ 2: Inverse Square Law but not always. Determined byfitting.

I Gist: As number of articles published increases, authorsproducing that many publications become less frequent.

I ”Success breeds success” ¨ .

I ”Disaster compounds disaster” ¨ .

Tony C. Scott Lotka’s Law

Page 4: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Applications of Power Law

Publications (of course) but also:

Art Productivity. E.g.: number of X statues of Venus vs Y ofartists who depicted Venus.

Geological: Sizes of earthquakes, craters on the moon and ofsolar flares.

Nature: Foraging pattern of various species.

Military: fighter pilot kills ¨ .

Languages: Frequencies of words in docs; of family name.

”Human affairs”: Sizes of power outages and wars.

Business rankings: Company size based on e.g. employee size, VCcapital investment.

etc. . . etc. . .

Tony C. Scott Lotka’s Law

Page 5: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Formula

Lotka’s law formula:

C = XnY ⇔ Y =C

Xn

where

I X: number of publications.

I Y: Number of authors writing X publications.

I n ≈ 2 but not always.

I applies to any field (if model holds).

I applies well for relatively short time windows.

Tony C. Scott Lotka’s Law

Page 6: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Get a Feel for it

Assume n = 2.

For example, we have:

1. Say 60% will have just one publication. (C = 0.6)

2. 15% will have 2 publications (1/22 × 0.60).

3. 7% of authors will have 3 publications (1/32 × 0.60).

4. etc. . .

Tony C. Scott Lotka’s Law

Page 7: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Power Law Curve Features: Divide into Two equal areas

Figure: On the right is the long tail; on the left is called the Head (alsoknown as the 80-20 rule or Pareto Principle).

Tony C. Scott Lotka’s Law

Page 8: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Power Law Curve Features: More “commercial” version

Figure: From Chris Anderson, editor-in-chief of Wired Magazine.Tony C. Scott Lotka’s Law

Page 9: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Pareto Principle or 80-20 ruleThe Law of the Vital few:

or principle of factor sparsity

I 20% of people dominate 80% of wealth.

I 20% of inventory occupies 80% of warehouse.

I 20% of sales staff make 80% of revenue.

I 20% of products make 80% of sales.

I 20% of criminals cause 80% of crimes.

I 20% of hazards cause 80% of injuries.

I 20% of patients use 80% of health care resources.

Not exact. Idea works but percentages vary.E.g.: 30% vs. 70%.

Tony C. Scott Lotka’s Law

Page 10: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Manipulation: put data on a Log-Log scale

Take logs on both sides of C = XnY :

log(C) = n log(X) + log(Y )

⇒ log(Y ) = −n log(X) + log(C)

I Get linear F = m x + b relation on a log scale.

I Verify with Pearson correlation.

I Least-Squares curve fit.

I (Non-linear) curve fit.

Tony C. Scott Lotka’s Law

Page 11: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Useful Tool: Pearson Correlation

Think of angle θ between 2 vectors u and v:

cos(θ) =u · v||u|| ||v||

Now subtract out mean (or average) from each vector:

u← u− u v← v − v

Obtain:

ρ(u,v) =

∑Ni=1(ui − u)(vi − v)√∑N

i=1(ui − u)2√∑N

i=1(vi − v)2

⇒Calculate ρ(log(X), log(Y))

Tony C. Scott Lotka’s Law

Page 12: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Log-Log Plot

Tony C. Scott Lotka’s Law

Page 13: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

First Test Cases

Reference: William G. Potter, “Lotka’s law revisited. LibraryTrends”, 30, 21-39. (1981).

1. Dickson H. Leavens in 1953 based his study ofeconometricians on the work of 721 authors who presentedpapers at meetings of the Econometric Society or had articlespublished in the first twenty volumes of Econometrica(1933-52) - Table 3 of Potter’s paper.

2. University of Illinois, Library of Urbana-Champaign, Study ofPersonal Authors in the Card Catalog. The Illinois catalogcontains records for about 2.5 million titles. A random sampleof 2345 personal authors was drawn - Table 5 of Potter’spaper.

Tony C. Scott Lotka’s Law

Page 14: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 1

Table: Data from Table 3 of W.G. Potter Reference

No. Contributions No. Contributors

1 4362 1073 614 405 14...

...30 137 146 1

Tony C. Scott Lotka’s Law

Page 15: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 1 - Log-Log Scale

log(No. Contributions)0 0.5 1 1.5 2 2.5 3 3.5 4

log

(No

. C

on

trib

uto

rs)

-2

-1

0

1

2

3

4

5

6

7Correlation of Log Data =-0.94 C =275.99 n=-1.7

Tony C. Scott Lotka’s Law

Page 16: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 1 - Linear Scale

No. Contributions

0 5 10 15 20 25 30 35 40 45 50

No

. C

on

trib

uto

rs

0

50

100

150

200

250

300

350

400

450

Linear Data vs Y = C*Xn =275.99*X

-1.7

Tony C. Scott Lotka’s Law

Page 17: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 2

Table: Data from Table 5 of W.G. Potter Reference

No. Works No. Authors

1 14892 3433 1604 925 44...

...58 163 166 1

Tony C. Scott Lotka’s Law

Page 18: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 2 - Log-Log Scale

log(No. Works)0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

log

(No

. A

uth

ors

)

-1

0

1

2

3

4

5

6

7

8Correlation of Log Data =-0.95 C =731.61 n=-1.8

Tony C. Scott Lotka’s Law

Page 19: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Test Case 2 - Linear Scale

No. Works

0 10 20 30 40 50 60 70

No

. A

uth

ors

0

500

1000

1500

Linear Data vs Y = C*Xn =731.61*X

-1.8

Tony C. Scott Lotka’s Law

Page 20: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Lotka’s Law of Productivity in Artwork

Sources:

I “Distant viewing in art history- A case study of artisticproductivity” by K. Bender, International Journal for DigitalArt History, June 2015, Issue no. 1: 100-110.

I Adjacent paper: “Quantitative Comparison between theItalian and French Iconography of Venus from the MiddleAges to Modern Times” by K. Bender.

I Fig. 11 of Adjacent paper generated from the two TopicalCatalogs “The French Venus” and “The Venus of the LowCountries”.

Tony C. Scott Lotka’s Law

Page 21: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Lotka’s Law Example - Venus Iconography

Figure: Fig. 11 of Adjacent paper generated from the two Topical Catalogs“The French Venus” and “The Venus of the Low Countries”.

Tony C. Scott Lotka’s Law

Page 22: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Lotka’s Law Example - “Italian Job”

Tony C. Scott Lotka’s Law

Page 23: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Related metrics-Generalizations

Zipf’s Law: frequency of any word is inversely proportional to itsrank in the frequency table (Lotka’s law is a specialapplication).

Bradford’s Law: estimates exponentially diminishing returns ofextending a search for references in science journals.

Benford’s Law: also called first-digit law of data:

1. 30% chance lead digit is 1,2. 18% chance lead digit is 2,3. larger digits occur with diminishing frequency

according to an inverse power law.

Gutenberg-Richter law: for Seismology

Moore’s Law: Not a power law: exponential law.

Tony C. Scott Lotka’s Law

Page 24: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Zipf’s Law

Named after American linguist George Kingsley Zipf (1902-1950).

I N be the number of elements and k be their rank,

I s be the value of the exponent characterizing the distribution.

Zipf’s law predicts that out of a population of N elements, thefrequency of elements of rank k, f(k; s,N), is:

f(k; s,N) =1/ks∑N

n=1(1/ns)

(1)

Zipf’s law holds if the number of elements with a given frequencyis a random variable with power law distribution p(f) = αf−1−1/s.Often written in terms of Riemann ζ function or distribution.

Note: Lotka’s law is a special case of Zipf’s law. (n = −1− 1/s)

Tony C. Scott Lotka’s Law

Page 25: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Zip’s law Example: Word frequency of Moby Dick

Figure: From Whim Online Magazine

Tony C. Scott Lotka’s Law

Page 26: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Another Zipf’s Law example

From “CONVERSABLE ECONOMIST” by Timothy Taylor

An example looking at the distribution of the size of US firms:

I Size measured by the number of employees on the horizontalaxis,

I The number of firms of this size, measured on the verticalaxis.

I Both axes are measured in powers of 10.

I Again, the end result is very closely obtained by Zipf’s Law.

Tony C. Scott Lotka’s Law

Page 27: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Zipf’s Law Example: Firm Size

Tony C. Scott Lotka’s Law

Page 28: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Peter Thiel’s CS183: Startup - Class 7 Notes Essay

Question: What does the distribution of returns in venture fundlook like?

Naive Answer: rank companies from best to worst according totheir return in multiple of dollars invested:

I People tend to group investments into three buckets.

I The bad companies go to zero.

I The mediocre ones do maybe 1x (stock with multiple of 1 in”liquidation preference”), so you don’t lose much or gainmuch.

I And then the great companies do maybe 3− 10x.

This model misses a key insight that actual returns are incrediblyskewed.

Tony C. Scott Lotka’s Law

Page 29: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Peter Thiel’s CS183: Continued

I Look at Founders Fund’s 2005 fund,

I The best investment ended up being worth about as much asall the rest combined.

I The investment in the second best company was about asvaluable as number three through the rest.

I This same dynamic generally held true throughout the fund:This is a power law distribution.

I To a first approximation, a VC portfolio will only make moneyif your best company investment ends up being worth morethan your whole fund.

I In practice, it’s quite hard to be profitable as a VC if youdon’t get to those numbers.

Tony C. Scott Lotka’s Law

Page 30: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Power Law Example: Peter Thiel’s CS183

Tony C. Scott Lotka’s Law

Page 31: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Seismology: Unified scaling law for earthquakes

The Gutenberg-Richter law (GR law) expresses the relationshipbetween the magnitude and total number of earthquakes in anygiven region and time period of at least that magnitude.

log10N = a− b m (2)

orN = 10a−bm (3)

where

I N is the number of events having a magnitude ≥ m.

I a and b are constants.

Tony C. Scott Lotka’s Law

Page 32: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

GR law: Example - Data over 6 decades

From Kim Christensen, Proceedings of the National Academy ofSciences of the USA, vol. 99 no. suppl 1, 2509-2513.

Figure: Gutenberg-Richter exponent b = 1 (dashed red line). Roll-off form < 2 is due to difficulties with detecting very small earthquakes

Tony C. Scott Lotka’s Law

Page 33: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Moore’s Law

I Observation that the number of transistors in a denseintegrated circuit doubles approximately every two years.

I Named after Gordon Moore, co-founder of FairchildSemiconductor and Intel. (1965 paper).

I Moore’s prediction proved accurate for several decades, andhas been used in the semiconductor industry to guidelong-term planning and to set targets for research anddevelopment.

I Main reason is the invention of the integrated circuit.

I Not a Power Law but an exponential law: See patterns on aSemi-Log scale.

Tony C. Scott Lotka’s Law

Page 34: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Moore’s Law: Example

curve shows transistor

count doubling every

two years

2,300

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

2,600,000,000

1971 1980 1990 2000 2011

Date of introduction

4004

8008

8080

RCA 1802

8085

8088

Z80

MOS 6502

6809

8086

80186

6800

68000

80286

80386

80486

Pentium

AMD K5

Pentium IIPentium III

AMD K6

AMD K6-IIIAMD K7

Pentium 4Barton

Atom

AMD K8

Itanium 2 CellCore 2 Duo

AMD K10Itanium 2 with 9MB cache

POWER6

Core i7 (Quad)Six-Core Opteron 2400

8-Core Xeon Nehalem-EXQuad-Core Itanium TukwilaQuad-core z1968-core POWER7

10-Core Xeon Westmere-EX

16-Core SPARC T3

Six-Core Core i7

Six-Core Xeon 7400

Dual-Core Itanium 2

AMD K10

Microprocessor Transistor Counts 1971-2011 & Moore's Law

Tra

nsis

tor

co

un

t

Figure: Plot of CPU transistor counts vs. dates of introduction.

Tony C. Scott Lotka’s Law

Page 35: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Figure: Updated version of Moore’s Law over 120 Years (by Ray Kurzweil).The 7 most recent data points are all NVIDIA GPUs.

Tony C. Scott Lotka’s Law

Page 36: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Overall Growth of Data

I Complementary relationship to Moore’s law: exponentialgrowth of (Big) Data.

I The Digital Universe: 50-fold Growth from the Beginning of2010 to the End of 2020.

I Source: International Data Corporation (IDC), an Americanmarket research, analysis and advisory firm, Dec. 2012.

I IDC says that in 2005 there were 130 exabytes of datacreated, about 2,837 exabytes in 2012, and a forecast of40,000 exabytes by 2020 . . .

I ICD says Digital Universe will be 35 Zettabytes by 2020. 1Zettabyte = 1000 exabytes = 1 billion terrabytes.

I Also an exponential law.

Tony C. Scott Lotka’s Law

Page 37: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Digitial Universe

Figure: 1 Exabyte = 1018 bytes = 1000 petabytes = 1 million terabytes = 1billion gigabytes.

Tony C. Scott Lotka’s Law

Page 38: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

How much Data can be analysed?

1. The rate of data growth is outstripping Moore’s Law;

2. The amount of data we want to store would be too expensivewith current technology;

3. The speed at which read-heads in large disk drives canretrieve data is not keeping up with data growth meaning thatretrieval times are simply too long, and

4. Newly available semi-structured and unstructured data canyield new insights with the new technology.

IDC estimates that only 12% of data today is being analyzed; that

3% of data today is tagged and could be analyzed, and that of the2,837 exabytes created in 2012, 23% would be useful if tagged andanalyzed. Clearly a great deal of potential value is being lost.

Tony C. Scott Lotka’s Law

Page 39: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Data Volume vs. Moore’s Law

Figure: “Data is growing faster than Moore’s Law”. Source: CapgeminiConsulting Technology Outsourcing, June 23,2015

Tony C. Scott Lotka’s Law

Page 40: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Examples from “The Challenge Network”

Death Statistics: Imagine terrorist incidents and theirconsequences. If you plot the number of terroristincidents of a certain lethality against the frequencywith which such incidents occur in time, you get ascatter that forms a nearly straight line when plottedon a log scale.

Other Examples: How probable is it that a meteor of a given sizewill hit the moon over such-and-so range of time?How likely is a financial crash at some arbitraryperiod in time?

Again Zipf distributions apply.

Tony C. Scott Lotka’s Law

Page 41: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Power Law Example: Death Statistics

Tony C. Scott Lotka’s Law

Page 42: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Rank-Size distribution

Rank-size distribution is the distribution of size by rank, indecreasing order of size.

I E.g. if a data set consists of items of sizes 5, 100, 5, and 8,the rank-size distribution is 100, 8, 5, 5 (ranks 1 through 4).

I Also known as the rank-frequency distribution, when thesource data are from a frequency distribution.

I These distributions frequently follow a power law distribution,or less well-known ones such as a stretched exponentialfunction (fβ(t) = exp(−tβ) or parabolic fractal distribution,at least approximately for certain ranges of ranks.

Tony C. Scott Lotka’s Law

Page 43: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Country ranking from Wikipedia

Figure: Rank-size distribution of the population of countries follows a stretchedexponential distribution except for China and India.

Tony C. Scott Lotka’s Law

Page 44: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Zipf’s Law and World city populations

Figure: From AI and Social Science by Brendan O’Connor, posted May24, 2009 using data from PopulationDate.net

Tony C. Scott Lotka’s Law

Page 45: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

General Power Law Examples

Tony C. Scott Lotka’s Law

Page 46: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Get the idea?

We get power laws or nearly so for:

I Laws of Physics (k/r2): (Kepler, Newton, ElectrosaticCoulomb Law)

I Publications

I Bug frequencies

I Frequencies governing city and company sizes

I markers for trends, distributions, predictions

I markers for success, victories, accomplishments

I markers for droughts, diseases, disasters!

Tony C. Scott Lotka’s Law

Page 47: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Why power law relationships show up so often?

I Say you start of with a random distribution of something,

I The different points in your data all tend to experienceproportional growth (positive or negative).

I However, this particular data (like city size) can’t turnnegative.

I In addition, the total size of the system can’t grow in anunbounded way.

I with a few additional assumptions resulting from the actualdynamics, this process will result in the kind of straight-linepower laws shown here.

Of course, this kind of explanation then needs to be adapted andapplied to each specific situation.

Tony C. Scott Lotka’s Law

Page 48: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Artifacts

Note those points for large X:

I Lotka formula underestimates number of works of moreprolific authors but applies fairly well for the less prolific.

I Known as the long tail of frequency distributions.

I Martize-Mekler (2009), Egghe (2010) alternative:

Y = c(Xmax+ 1−X)b

Xa

where a, b and c are parameters and. . .

I Xmax is maximum number of works produced by Y authorsand this for X = 1, C = cXb

max.

Tony C. Scott Lotka’s Law

Page 49: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Estimating exponent n

Non-linear Fitting: instead of linear fit of log data.

Two point fitting method: Use cumulative frequency i.e.’bundling” of frequencies w.r.t. reference value.Smoother functions.

Maximum likelihood: gives the observed data the greatestprobability.

Kolmogorov-Smirnov (K-S): Recommended: does not assumeindependent and identically distributed data.

Tony C. Scott Lotka’s Law

Page 50: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Improvements: Kolmogorov-Smirnov

Minimize Kolmogorov-Smirnov (K-S)statistic

n = arg minn

Dn

i.e. find n such that Dn is smallest and:

Dn = maxX|Pdata(X)− PLotka(X)|

I cmds: cumulative distribution of exponent n.

I Pdata cmds of Data.

I PLotka cmds of Lotka’s model.

Tony C. Scott Lotka’s Law

Page 51: Lotka's Law (and Other Power Laws) - NearLotka’s Law (and Other Power Laws) T.C. Scott presented at Near India Pvt Ltd. No. 71/72, Jyoti Nivas College Road, Koramangala, Bangalore

Conclusions/Recommendations

I Power laws like Lotka’s law appear to be ubiquitous to natureand even subjective man-made data, more than many peoplerealize.

I The potential for applications of these power laws isunderestimated and far from exhausted.

I When in doubt: put the frequency statistics on a log-log scale.

I Accuracy can be estimated in advance using Pearsoncorrelation on log data.

I Threshold at e.g. > 75% .

I ⇒If Threshold is met: produce the Lotka (Power) plot.

I Deal with refinements of metric later ^.

I Keep Pareto principle in mind for all data.

Tony C. Scott Lotka’s Law