24
Cross-language Wikipedia Editing of Okinawa, Japan Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?chi2015 @computermacgyve 20 April 2015 Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Cross-language Wikipedia Editing of Okinawa, Japan (slides)

Embed Size (px)

Citation preview

Cross-language Wikipedia Editing of Okinawa, Japan

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/pubs/?chi2015

@computermacgyve

20 April 2015

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

How can user-generated content platforms be multilingualwithout fragmenting users too thinly across languages?

Large difference in information available in different languages (Hecht& Gergle, 2010; Hong, Convertino, & Chi, 2011)

Language large barrier to the spread of local information (Sen et al.,2015)

Do multilingual users bridge language divides? (Hale, 2014b)

15% of Wikipedia users edit multiple language editions

Multilingual users are more active than monolingual usersbut mainly more active in their first/primary language

Unclear how much they transfer information between languages(nearly half edit different articles in different languages)

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Research questions

1 What articles do multilinguals edit in their non-primary languages?

2 What types of edits do multilingual users make in their non-primarylanguages?

3 How valuable are the contributions by multilingual users in theirnon-primary languages?

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Data

Full edit histories of Okinawa-related Wikipedia articles

Japanese and English editions

Articles on the same concept connected via inter-language links(WikiData)

Users connected across languages via global accounts (CentralAuthdatabase)

Bots & malicious users removed based on userpage content, usergroups, or being banned within one year from data collection

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Okinawa

Japan

China

SouthKorea

Taiwan

Okinawa

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Independent kingdom until thelate 1800’s

Administered by the US1945–1972

Large number of US military,contractors, and dependentsliving on the islands today

Article landscape

Sample en-only ja-only Both

Geotag 52 185 152Category 156 2,819 707Article link 3,411 9,984 5,567

Table: The number of unique concepts in each sample. The majority of conceptshave an article either only in the English edition or only in the Japanese edition(en-only or ja-only), while a smaller number of concepts have articles in both theEnglish and Japanese editions (Both).

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Data overview

Total users Total articles editedCount % Count %

English editionAnonymous 192,839 73.4% 216,840 46.15%Local account 15,008 5.7% 58,689 12.49%Pri. English 50,038 19.0% 179,951 38.30%Pri. Japanese 466 0.2% 1,488 0.32%Pri. Other 4,341 1.7% 12,911 2.75%Totals 262,692 100.0% 469,879 100.0%

Japanese editionAnonymous 372,852 88.4% 717,608 62.74%Local account 9,945 2.4% 109,765 9.60%Pri. English 558 0.1% 5,531 0.48%Pri. Japanese 37,191 8.8% 301,980 26.40%Pri. Other 1,174 0.3% 8,954 0.78%Totals 421,720 100.0% 1,143,838 100.0%

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Research questions

1 What articles do multilinguals edit in their non-primarylanguages?

2 What types of edits do multilingual users make in their non-primarylanguages?

3 How valuable are the contributions by multilingual users in theirnon-primary languages?

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Multilinguals edit articles with versions in both languages

69%

11%

89%

64%

89%

75%

64%

58%

55% 45%

42%

36%

25%

36%

11%

31%

English users editing the Japanese edition are far less likely than other users to edit articlesthat only appear in Japanese. Similarly, Japanese users editing the English edition are farless likely than other users to edit articles that only appear in English.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Edit popular articles

# of Japanese users editing English # of English users editing JapaneseEstimate (Standard error) Estimate (Standard error)

Exists in both languages 0.641∗∗∗ (0.024) 3.285∗∗∗ (0.034)Total number of editors 0.001∗∗∗ (0.0001) 0.003∗∗∗ (0.0001)PageRank 0.014∗∗∗ (0.0005) 0.245∗∗∗ (0.006)Number of images 0.003∗∗∗ (0.001) 0.054∗∗∗ (0.002)Number of external links 0.001∗∗∗ (0.0003) −0.0003 (0.0004)Constant 0.008 (0.015) 0.029 (0.019)

Observations 5,441 14,825Adjusted R2 0.348 0.572Residual Std. Error 0.849 (df = 5435) 1.828 (df = 14819)

∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Table: Linear regression results fitting the number of primary Japanese users editingeach English article and the number of primary English users editing each Japanesearticle.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Research questions

1 What articles do multilinguals edit in their non-primary languages?2 What types of edits do multilingual users make in their

non-primary languages?1 Edit size2 Content changes

3 How valuable are the contributions by multilingual users in theirnon-primary languages?

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Edit size

Measuring size

More than byte difference

Use sub-score from WikiTrust algorithms (Adler, Chatterjee, et al.,2008; Adler & Alfaro, 2007; Adler, Alfaro, Pye, & Raman, 2008)

Mecab to determine Japanese word boundaries

1 point for each word added or deleted

0.5 point for each edited word

0 < x < 1 point for moving a word a fraction x of the normalized pagelength

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Edit size

Figure: Density plots for non-anonymous users editing articles in the Japanese (left)and English (right) editions grouped by their primary language editions. Verticallines indicate distribution means.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Content changes: Exploratory qualitative coding of edits

Up to 5 randomly chosen edits from 70 randomly chosen multilingual users(35 primary editors of English and 35 primary editors of Japanese)

Emergent coding of edits into 6 non-exclusive categoriesAddition Adding new text or references to an existing article or

creating a new article

Maintenance Adding, removing, or adjusting templates, categories,links in a “See Also” section, or whitespace changesthat did not alter text

Deletion/reversion Reverting an edit or deleting text from an article

Image-related Adding, altering, or removing an image

Interlanguage-links Altering interlanguage links

Change Edits that changed existing text such as correctingspelling errors or updating facts that had changed likethe latest winner of an annual sports tournament

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Content changes

Edit category Pri. lang. Non-pri. lang. p-val�

Addition 97 31% 47 26% 0.25Maintenance 103 33% 44 24% 0.04Deletion/Reversion 37 12% 11 6% 0.03Image-related 27 9% 32 18% 0.01Interlanguage links 8 3% 32 18% 0.00Change 65 21% 34 19% 0.62

Total edits� 315 181

Table: Exploratory, qualitative coding of edits in users’ primary languages (pri.lang.) and non-primary languages (non-pri. lang.).�p-values are for two-tailed t-tests on difference of percentage means.�Some edits are assigned to multiple categories and, therefore, the column sums aregreater than the total number of edits reported.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Research questions

1 What articles do multilinguals edit in their non-primary languages?

2 What types of edits do multilingual users make in their non-primarylanguages?

3 How valuable are the contributions by multilingual users in theirnon-primary languages?

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Edit value

Measuring edit value

Many ways users contribute value

Simple, quantitative measure is how much of an edit is retained bysubsequent editors

Computed using WikiTrust edit survival scores for next six edits (Adler,Chatterjee, et al., 2008; Adler & Alfaro, 2007; Adler, Alfaro, et al.,2008)

No significant difference

Text from edits made by non-primary editors survived at a similar rate to thetext from edits made by users who primarily edited each edition.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Research questions

1 What articles do multilinguals edit in their non-primary languages?

Articles that also exist in their primary languageArticles that have more edits / editors overallArticles with more images

2 What types of edits do multilingual users make in their non-primarylanguages?

Smaller-sized editsMore image-related and interlanguage link-related editsFewer deletion/reversion and maintenance editsSimilar amounts of additions and change editsUnique contributions related to language (e.g., 15% of Japanese edits inEnglish added/corrected Japanese characters/romanizations)

3 How valuable are the contributions by multilingual users in theirnon-primary languages?

No significant differences.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Implications and future directions

Future

Additional languages, larger-scale classification of edit types

Okinawa is probably a “hard case”

Very different languages, writing systemsJapanese users consistently been observed to engage less withother-language content (Hale, 2014a, 2014b)

Implications

Importance of holistic, cross-language measurement of reputation forawarding badges, etc.

Images/multimedia as good cross-language starter tasks

Discovery of related other-language content is a barrier?

No multilingual search or recommendation on WikipediaMultilingual users are good candidates for recommendation given theyoverlap to some extent with “power users” (Huang, Suh, Hill, & Hsieh,2015)Implications to translation environment being developed by WikimediaFoundation

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Implications and future directions

Future

Additional languages, larger-scale classification of edit types

Okinawa is probably a “hard case”

Very different languages, writing systemsJapanese users consistently been observed to engage less withother-language content (Hale, 2014a, 2014b)

Implications

Importance of holistic, cross-language measurement of reputation forawarding badges, etc.

Images/multimedia as good cross-language starter tasks

Discovery of related other-language content is a barrier?

No multilingual search or recommendation on WikipediaMultilingual users are good candidates for recommendation given theyoverlap to some extent with “power users” (Huang et al., 2015)Implications to translation environment being developed by WikimediaFoundation

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Cross-language Wikipedia Editing of Okinawa, Japan

Scott A. HaleOxford Internet Institute

http://www.scotthale.net/pubs/?chi2015

@computermacgyve

20 April 2015

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

I would like to thank Eric T. Meyer, Taha Yasseri, and Alolita Sharma as well as theanonymous CHI reviewers who provided helpful comments on previous versions of thisarticle.

Adler, B. T., & Alfaro, L. de. (2007). A content-driven reputation systemfor the Wikipedia. In Proceedings of the 16th international conferenceon world wide web (pp. 261–270). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1242572.1242608

Adler, B. T., Alfaro, L. de, Pye, I., & Raman, V. (2008). Measuring authorcontributions to the Wikipedia. In Proceedings of the 4th internationalsymposium on wikis (pp. 15:1–15:10). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1822258.1822279

Adler, B. T., Chatterjee, K., Alfaro, L. de, Faella, M., Pye, I., & Raman, V.(2008). Assigning trust to Wikipedia content. In Proceedings of the4th international symposium on wikis (pp. 26:1–26:12). New York,NY, USA: ACM. Available fromhttp://doi.acm.org/10.1145/1822258.1822293

Hale, S. A. (2014a). Global connectivity and multilinguals in the Twitternetwork. In Proceedings of the sigchi conference on human factors incomputing systems (pp. 833–842). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/2556288.2557203

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Hale, S. A. (2014b). Multilinguals and Wikipedia editing. In Proceedings ofthe 6th annual acm web science conference. New York, NY, USA:ACM. Available from http://arxiv.org/abs/1312.0976

Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:User-generated content and its applications in a multilingual context.In Proceedings of the 28th international conference on human factorsin computing systems (pp. 291–300). New York, NY, USA: ACM.Available from http://doi.acm.org/10.1145/1753326.1753370

Hong, L., Convertino, G., & Chi, E. (2011). Language matters in Twitter:A large scale study. In International AAAI conference on weblogs andsocial media (pp. 518–521). Available from http://www.aaai.org/

ocs/index.php/ICWSM/ICWSM11/paper/view/2856

Huang, S.-W., Suh, M., Hill, B. M., & Hsieh, G. (2015). How activists areboth born and made: An analysis of users on change.org. InProceedings of the 29th international conference on human factors incomputing systems.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan

Sen, S., Ford, H., Musicant, D., Graham, M., Keyes, O. S., & Hecht, B.(2015). Barriers to the localness of volunteered geographicinformation. In Proceedings of the 29th international conference onhuman factors in computing systems.

Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan