Death and Edits

Preview:

Citation preview

Death and edits

Miles Lincoln

LIS590MT

Wikipedia!

You probably already know

what it is!

Bursts in social networks

Bursts of edits on Wikipedia in particular

When do those occur?

What can we learn by looking at

spikes in edit frequency?

How have edit spikes changed over Wikipedia’s ten

years of existence?

Does the size of an edit spike correlate to anything?

Bursts in other social networks

Google Trends

Celebrity deaths!

Revision history

Revision history

But first…

We need to process the data so that we can answer that

question

Perl

Regular Expressions (Regex)

Perl script uses regular expressions to find and

output matching pieces of text.

In this case, I am pulling out dates in Wikipedia’s

day month year format and re-writing them in a

more machine-readable MM/DD/YYYY format.

11/08/2011

Data manipulation

Copy/pase the revision history of wiki

pages into a text document which I

feed to my perl script

Results in lists consisting of one date

per edit that occurred on that date

Copying/pasting isn’t super

elegant, but I haven’t gotten

LWP/useragent stuff to work yet

Excel!

Throw my lists of dates into a pivot table, which

shows me the frequency that each date occurs

Some vlookup magic allows me to combine

these edit frequencies of individual actors into one big list covering every day from 6/1/2001 to

the present

Et Voila!

Problems

9 actors over 10 years means close to 100k cells

Excel is not built for speed

Matlab might work better

What does the data look like over

time?

6/1-5/31 from 2001 (when Wikipedia’s current edit no.’s

begin) to 2010 (when all of the bursts have settled down)

6/1/2001-5/31/2002

0

0.2

0.4

0.6

0.8

1

1.2

6/1/01 7/1/01 8/1/01 9/1/01 10/1/01 11/1/01 12/1/01 1/1/02 2/1/02 3/1/02 4/1/02 5/1/02

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2002-5/31/2003

0

2

4

6

8

10

12

14

6/1/02 7/1/02 8/1/02 9/1/02 10/1/02 11/1/02 12/1/02 1/1/03 2/1/03 3/1/03 4/1/03 5/1/03

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2003-5/31/2004

0

5

10

15

20

25

30

6/1/03 7/1/03 8/1/03 9/1/03 10/1/03 11/1/03 12/1/03 1/1/04 2/1/04 3/1/04 4/1/04 5/1/04

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2004-5/31/2005

0

10

20

30

40

50

60

6/1/04 7/1/04 8/1/04 9/1/04 10/1/04 11/1/04 12/1/04 1/1/05 2/1/05 3/1/05 4/1/05 5/1/05

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2005-5/31/2006

0

5

10

15

20

25

30

6/1/05 7/1/05 8/1/05 9/1/05 10/1/05 11/1/05 12/1/05 1/1/06 2/1/06 3/1/06 4/1/06 5/1/06

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2006-5/31/2007

0

5

10

15

20

25

30

35

40

45

50

6/1/06 7/1/06 8/1/06 9/1/06 10/1/06 11/1/06 12/1/06 1/1/07 2/1/07 3/1/07 4/1/07 5/1/07

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2007-5/31/2008

0

50

100

150

200

250

300

350

400

6/1/07 7/1/07 8/1/07 9/1/07 10/1/07 11/1/07 12/1/07 1/1/08 2/1/08 3/1/08 4/1/08 5/1/08

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2008-5/31/2009

0

10

20

30

40

50

60

70

80

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

6/1/2009-5/31/2010

0

20

40

60

80

100

120

140

160

180

200

6/1/09 7/1/09 8/1/09 9/1/09 10/1/09 11/1/09 12/1/09 1/1/10 2/1/10 3/1/10 4/1/10 5/1/10

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Series10

Spike sizes over the years

0

50

100

150

200

250

300

350

400

2002 2003 2004 2005 2006 2007 2008 2009

Series2

Let’s take a closer look at the more

interesting actors

Actors #4-9 6/1/2008-5/31/2009

0

10

20

30

40

50

60

70

80

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

Actors #4-9 6/1/2008-5/31/2009 -log

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

One actor at a time ~10 years

Actor #1 DoD: 6/27/2001 -edits/day

0

2

4

6

8

10

12

14

6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11

Series1

Actor #1 –log(edits)/day

0

0.2

0.4

0.6

0.8

1

1.2

6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11

Series1

Actor #7 -edits/day

0

10

20

30

40

50

60

70

80

90

100

9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11

Series1

Actor #7 –log(edits)/day

0

0.5

1

1.5

2

2.5

9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11

Series1

Actor #8 -edits/day

0

50

100

150

200

250

300

350

400

12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10

Series1

Actor #8 –log(edits)/day

0

0.5

1

1.5

2

2.5

3

12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10

Series1

Actor #9 –edits/day

0

20

40

60

80

100

120

140

160

180

200

2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11

Series1

Actor #9 –log(edits)/day

0

0.5

1

1.5

2

2.5

2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11

Series1

If we tweak the data to take

importance into consideration…

Average gross, adjusted for inflation*

Only available for a small amount of actors chosen in the

sample set

Taken from boxofficemojo.com

Extremely reliable source

Actor #8 vs. Actor #9

0

50

100

150

200

250

300

350

400

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

25

7

26

5

27

3

28

1

28

9

29

7

30

5

31

3

32

1

32

9

33

7

34

5

35

3

36

1

ledger

swayze

Actor #8 vs. Actor #9 (adjusted)

0

50

100

150

200

250

300

350

400

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

25

7

26

5

27

3

28

1

28

9

29

7

30

5

31

3

32

1

32

9

33

7

34

5

35

3

36

1

ledger

swayze adjusted

Actor #8 Vs. Actor #9 (adjusted)

0

0.5

1

1.5

2

2.5

3

1

10

19

28

37

46

55

64

73

82

91

10

0

10

9

11

8

12

7

13

6

14

5

15

4

16

3

17

2

18

1

19

0

19

9

20

8

21

7

22

6

23

5

24

4

25

3

26

2

27

1

28

0

28

9

29

8

30

7

31

6

32

5

33

4

34

3

35

2

36

1

ledger log

swayze adjusted log

The same data on Google trends

-10 days to +40 days (log)

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950

coburn log

peck log

brando log

davis log

palance log

goulet log

ledger log

swayze log

Other things I should consider

Age at death

Cause of death

Were they still acting?

Future directions

New sample of Wikipedia pages

Need to compare more contemporary pages

Need new metrics for comparison

Better workflows

Thanks!

Questions?

http://www.slideshare.net/mlincol2/informetrics

Recommended