48
Death and edits Miles Lincoln LIS590MT

Death and Edits

Embed Size (px)

Citation preview

Page 1: Death and Edits

Death and edits

Miles Lincoln

LIS590MT

Page 2: Death and Edits

Wikipedia!

You probably already know

what it is!

Page 3: Death and Edits

Bursts in social networks

Bursts of edits on Wikipedia in particular

When do those occur?

Page 4: Death and Edits

What can we learn by looking at

spikes in edit frequency?

How have edit spikes changed over Wikipedia’s ten

years of existence?

Does the size of an edit spike correlate to anything?

Page 5: Death and Edits

Bursts in other social networks

Page 6: Death and Edits

Google Trends

Page 7: Death and Edits

Celebrity deaths!

Page 8: Death and Edits

Revision history

Page 9: Death and Edits

Revision history

Page 10: Death and Edits

But first…

We need to process the data so that we can answer that

question

Page 11: Death and Edits

Perl

Page 12: Death and Edits

Regular Expressions (Regex)

Perl script uses regular expressions to find and

output matching pieces of text.

In this case, I am pulling out dates in Wikipedia’s

day month year format and re-writing them in a

more machine-readable MM/DD/YYYY format.

11/08/2011

Page 13: Death and Edits

Data manipulation

Copy/pase the revision history of wiki

pages into a text document which I

feed to my perl script

Results in lists consisting of one date

per edit that occurred on that date

Copying/pasting isn’t super

elegant, but I haven’t gotten

LWP/useragent stuff to work yet

Page 14: Death and Edits

Excel!

Throw my lists of dates into a pivot table, which

shows me the frequency that each date occurs

Some vlookup magic allows me to combine

these edit frequencies of individual actors into one big list covering every day from 6/1/2001 to

the present

Page 15: Death and Edits

Et Voila!

Page 16: Death and Edits

Problems

9 actors over 10 years means close to 100k cells

Excel is not built for speed

Matlab might work better

Page 17: Death and Edits

What does the data look like over

time?

6/1-5/31 from 2001 (when Wikipedia’s current edit no.’s

begin) to 2010 (when all of the bursts have settled down)

Page 18: Death and Edits

6/1/2001-5/31/2002

0

0.2

0.4

0.6

0.8

1

1.2

6/1/01 7/1/01 8/1/01 9/1/01 10/1/01 11/1/01 12/1/01 1/1/02 2/1/02 3/1/02 4/1/02 5/1/02

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 19: Death and Edits

6/1/2002-5/31/2003

0

2

4

6

8

10

12

14

6/1/02 7/1/02 8/1/02 9/1/02 10/1/02 11/1/02 12/1/02 1/1/03 2/1/03 3/1/03 4/1/03 5/1/03

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 20: Death and Edits

6/1/2003-5/31/2004

0

5

10

15

20

25

30

6/1/03 7/1/03 8/1/03 9/1/03 10/1/03 11/1/03 12/1/03 1/1/04 2/1/04 3/1/04 4/1/04 5/1/04

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 21: Death and Edits

6/1/2004-5/31/2005

0

10

20

30

40

50

60

6/1/04 7/1/04 8/1/04 9/1/04 10/1/04 11/1/04 12/1/04 1/1/05 2/1/05 3/1/05 4/1/05 5/1/05

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 22: Death and Edits

6/1/2005-5/31/2006

0

5

10

15

20

25

30

6/1/05 7/1/05 8/1/05 9/1/05 10/1/05 11/1/05 12/1/05 1/1/06 2/1/06 3/1/06 4/1/06 5/1/06

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 23: Death and Edits

6/1/2006-5/31/2007

0

5

10

15

20

25

30

35

40

45

50

6/1/06 7/1/06 8/1/06 9/1/06 10/1/06 11/1/06 12/1/06 1/1/07 2/1/07 3/1/07 4/1/07 5/1/07

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 24: Death and Edits

6/1/2007-5/31/2008

0

50

100

150

200

250

300

350

400

6/1/07 7/1/07 8/1/07 9/1/07 10/1/07 11/1/07 12/1/07 1/1/08 2/1/08 3/1/08 4/1/08 5/1/08

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 25: Death and Edits

6/1/2008-5/31/2009

0

10

20

30

40

50

60

70

80

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Page 26: Death and Edits

6/1/2009-5/31/2010

0

20

40

60

80

100

120

140

160

180

200

6/1/09 7/1/09 8/1/09 9/1/09 10/1/09 11/1/09 12/1/09 1/1/10 2/1/10 3/1/10 4/1/10 5/1/10

Series1

Series2

Series3

Series4

Series5

Series6

Series7

Series8

Series9

Series10

Page 27: Death and Edits

Spike sizes over the years

0

50

100

150

200

250

300

350

400

2002 2003 2004 2005 2006 2007 2008 2009

Series2

Page 28: Death and Edits

Let’s take a closer look at the more

interesting actors

Page 29: Death and Edits

Actors #4-9 6/1/2008-5/31/2009

0

10

20

30

40

50

60

70

80

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

Page 30: Death and Edits

Actors #4-9 6/1/2008-5/31/2009 -log

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

6/1/08 7/1/08 8/1/08 9/1/08 10/1/08 11/1/08 12/1/08 1/1/09 2/1/09 3/1/09 4/1/09 5/1/09

Series1

Series2

Series3

Series4

Series5

Series6

Page 31: Death and Edits

One actor at a time ~10 years

Page 32: Death and Edits

Actor #1 DoD: 6/27/2001 -edits/day

0

2

4

6

8

10

12

14

6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11

Series1

Page 33: Death and Edits

Actor #1 –log(edits)/day

0

0.2

0.4

0.6

0.8

1

1.2

6/28/01 6/28/02 6/28/03 6/28/04 6/28/05 6/28/06 6/28/07 6/28/08 6/28/09 6/28/10 6/28/11

Series1

Page 34: Death and Edits

Actor #7 -edits/day

0

10

20

30

40

50

60

70

80

90

100

9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11

Series1

Page 35: Death and Edits

Actor #7 –log(edits)/day

0

0.5

1

1.5

2

2.5

9/24/03 9/24/04 9/24/05 9/24/06 9/24/07 9/24/08 9/24/09 9/24/10 9/24/11

Series1

Page 36: Death and Edits

Actor #8 -edits/day

0

50

100

150

200

250

300

350

400

12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10

Series1

Page 37: Death and Edits

Actor #8 –log(edits)/day

0

0.5

1

1.5

2

2.5

3

12/10/03 12/10/04 12/10/05 12/10/06 12/10/07 12/10/08 12/10/09 12/10/10

Series1

Page 38: Death and Edits

Actor #9 –edits/day

0

20

40

60

80

100

120

140

160

180

200

2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11

Series1

Page 39: Death and Edits

Actor #9 –log(edits)/day

0

0.5

1

1.5

2

2.5

2/28/04 2/28/05 2/28/06 2/28/07 2/28/08 2/28/09 2/28/10 2/28/11

Series1

Page 40: Death and Edits

If we tweak the data to take

importance into consideration…

Average gross, adjusted for inflation*

Only available for a small amount of actors chosen in the

sample set

Taken from boxofficemojo.com

Extremely reliable source

Page 41: Death and Edits

Actor #8 vs. Actor #9

0

50

100

150

200

250

300

350

400

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

25

7

26

5

27

3

28

1

28

9

29

7

30

5

31

3

32

1

32

9

33

7

34

5

35

3

36

1

ledger

swayze

Page 42: Death and Edits

Actor #8 vs. Actor #9 (adjusted)

0

50

100

150

200

250

300

350

400

1 9

17

25

33

41

49

57

65

73

81

89

97

10

5

11

3

12

1

12

9

13

7

14

5

15

3

16

1

16

9

17

7

18

5

19

3

20

1

20

9

21

7

22

5

23

3

24

1

24

9

25

7

26

5

27

3

28

1

28

9

29

7

30

5

31

3

32

1

32

9

33

7

34

5

35

3

36

1

ledger

swayze adjusted

Page 43: Death and Edits

Actor #8 Vs. Actor #9 (adjusted)

0

0.5

1

1.5

2

2.5

3

1

10

19

28

37

46

55

64

73

82

91

10

0

10

9

11

8

12

7

13

6

14

5

15

4

16

3

17

2

18

1

19

0

19

9

20

8

21

7

22

6

23

5

24

4

25

3

26

2

27

1

28

0

28

9

29

8

30

7

31

6

32

5

33

4

34

3

35

2

36

1

ledger log

swayze adjusted log

Page 44: Death and Edits

The same data on Google trends

Page 45: Death and Edits

-10 days to +40 days (log)

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950

coburn log

peck log

brando log

davis log

palance log

goulet log

ledger log

swayze log

Page 46: Death and Edits

Other things I should consider

Age at death

Cause of death

Were they still acting?

Page 47: Death and Edits

Future directions

New sample of Wikipedia pages

Need to compare more contemporary pages

Need new metrics for comparison

Better workflows

Page 48: Death and Edits

Thanks!

Questions?

http://www.slideshare.net/mlincol2/informetrics