28
Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links HICSS 49, January 8th, 2016 Eva Zangerle, Georg Schmidhammer, Günther Specht University of Innsbruck, Austria

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Embed Size (px)

Citation preview

Page 1: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

HICSS 49, January 8th, 2016

Eva Zangerle, Georg Schmidhammer, Günther SpechtUniversity of Innsbruck, Austria

Page 2: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

2MotivationWhy this work does matter…

• Wikipedia central source of information

• 450 million users per month, 277 editions

• Research focused on intrinsic factors• community• content• quality

Page 3: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

3MotivationWhy this work does matter…

• Wikipedia central source of information

• 450 million users per month, 277 editions

• Research focused on intrinsic factors• community• content• quality

• What about extrinsic factors?

Page 4: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

4Our Vision: Extrinsic Quality-Measures

Page 5: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

5

Inter-language Link Analysis

Our Vision: Extrinsic Quality-Measures

Page 6: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

6Previous Research

Eva Zangerle, Georg Schmidhammer and Günther Specht. #Wikipedia on Twitter: Analyzing Tweets About Wikipedia. In Proceedings of the 11th International Symposium on Open Collaboration, OpenSym ’15, pages 14:1–14:8, New York, NY, USA, 2015. ACM.

• Extrinsic view on Wikipedia via Twitter

• 20% of all tweets lead to a Wikipedia other than the tweet‘s language (except for English and Japanese)

Page 7: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

7Research Questions

How are inter-language links distributed among the different Wikipedias?

What are the causes for users to link to a Wikipedia other than the one of their langage?

Page 8: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

Page 9: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

9Crawling

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

• Twitter API• Search for keyword „wikipedia“

• 2014/10/20 – 2015/04/28

• 6,415,762 tweets in total

• Extraction of links from tweets

Page 10: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

10Cleaning Data

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

• Filter tweets with no Wikipedia URL contained

• Bots contained in dataset • 99th percentile (>130 tweets)• BotOrNot Detection Service for 1,083 accounts• users and tweets deleted from dataset

Page 11: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

11Cleaning Data

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

Feature Raw CleanedTweets 6,415,762 2,844,399 Retweets 2,040,816 855,959 Distinct Users 2,287,430 1,092,732Mentions 4,673,284 2,437,092Distinct

Hashtags213,574 127,958

Hashtag Usages

2,283,535 788,210

Distinct URLs 1,976,479 1,179,288URL Usages 4,825,230 3,130,420

Page 12: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

12Crawling Wikipedia

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

• MediaWiki API• Resolution of revision ID for time tweet was sent• Crawling of

• article• headings• wikilinks• references• images

• Last 500 edits

Page 13: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

13Quality Measures

Crawl Twitter

Crawl Wikiped

ia

Clean Data

Quality Analyse

s

Extract Links

1. Article length2. Number of references (absolute)3. Number of references (relative)4. Diversity5. Number of headings (absolute)6. Number of headings (relative) Warncke-Wang, M., Cosley, D., and Riedl, J. "Tell Me More: An Actionable Quality Model for Wikipedia", in the proceedings of WikiSym 2013

7. Informativeness 8. Number of images (relative) 9. Number of wikilinks (relative)10.Currency11.HasInfoBox12.Complexity (Flesch Kincaid)

Page 14: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Results

Page 15: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

15RQ1: Distribution of (Inter-language) links

Top3 Interlanguage Targets: 62.68 % English 6.26% Japanese5.76% Spanish

Page 16: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

16RQ2: Causes for Inter-language Links

85% do not have a counterpart

in the tweet‘s language (out of 691,424 inter-language links)

Page 17: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

17RQ2: Causes for Inter-language Links

Remaining 15%: Could article quality be an issue?

https://en.wikipedia.org/wiki/Black_Monday_(1987)

https://es.wikipedia.org/wiki/Lunes_negro_(1987)

originally posted counterpart

Page 18: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

18

Page 19: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

19

Page 20: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

20RQ2: Causes for Inter-language Links

• Remaining 99,776 articles: apply 12 quality measures to all originally posted articles and their counterparts

• Group articles into language pairs (original and counterpart language)

• For each article in language pair count number of measures original articles performance better than counterpart and vice versa (result: two vectors)

• Wilcoxon signed rank test for each language pair

Page 21: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

21RQ2: Causes for Inter-language Links

for

58% of all language combinations

the tweeted language is of significantly better quality (p < 0.05)

Page 22: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

22Dominating Languages

Target Better than (p < 0.05) CountEnglish Spanish, Japanese, French, Korean,

Italian, German, Arabic, Indonesian, Portuguese, Dutch, Turkish, Swedish, Thai, Polish, Romanian, Finnish, Danish, Norwegian, Farsi, Welsh, Hindi, Bulgarian, Latvian, Bosnian, Slovakish, Hung-arian, Slovenian, Lithuanian, Bosnian

28

French English, Japanese, Spanish 3Spanish English, Italian 2Catalan English, Portuguese 2German English 1Japanese German 1Portuguese Spanish 1Turkish English 1

Page 23: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

23Dominating Languages• Most dominating target languages are English,

Spanish, Japanese• most extensive Wikipedias• most active Wikipedias

more elaborate, mature articles than in user‘s language

Page 24: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

24Quality Measures

66% of all articles tweeted feature a significantly higher quality

for all twelve quality measures(p < 0.001)

Page 25: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

25Quality Measures

97% of all articles tweeted feature a significantly higher quality

for more than six quality measures(p < 0.001)

Page 26: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

26Conclusion

85% of all inter-language links: no counterpart available

Articles tweeted are of significantly higher quality (with English, Japanese and German dominating)

Users deliberately tweet article of higher quality

Page 27: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Questions?

any coffee break

@[email protected]://www.evazangerle.at

http://dbis-informatik.uibk.ac.athttps://www.facebook.com/dbisibk

Contact

Page 28: Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Analysing the Usage of Wikipedia on Twitter: Understanding Inter-Language Links

Eva Zangerle, Georg Schmidhammer, Günther SpechtUniversity of Innsbruck, Austria