42
CIKM 2011 | Invited Talk Model-Driven Research in Social Computing Ed H. Chi Google Research Work done while at Palo Alto Research Center (PARC) 2011-10-27 CIKM 2011 Invited Talk 1

CIKM 2011 Social Computing Industry Invited Talk

  • Upload
    ed-chi

  • View
    3.871

  • Download
    1

Embed Size (px)

Citation preview

Page 1: CIKM 2011 Social Computing Industry Invited Talk

CIKM 2011 | Invited Talk

Model-Driven Research in Social Computing

Ed H. Chi

Google Research Work done while at Palo Alto Research Center (PARC)

2011-10-27 CIKM 2011 Invited Talk 1

Page 2: CIKM 2011 Social Computing Industry Invited Talk

Some  Google  Social  Stats  n  250,000  words  are  written  each  minute  on  Blogger  -­‐  

that’s  360  million  words  a  day  n  Every  16  seconds  people  view  enough  photos  from  

Picasa  Web  Albums  to  cover  an  entire  football  field  n  Every  8  minutes,  more  photos  are  viewed  on  Picasa  

Web  Albums  than  exist  in  the  entire  Time-­‐LIFE  photo  collection  

2011-10-27 CIKM 2011 Invited Talk 2

Page 3: CIKM 2011 Social Computing Industry Invited Talk

YouTube  Stats  n  150  years  of  YouTube  video  are  watched  everyday  on  

Facebook  (up  2.5x  y/y)  n  every  minute  400+  tweets  contain  YouTube  links  (up  3x  

y/y)  [Q1  20111]  n  100M+  people  take  a  social  action  with  YouTube  (likes,  

shares,  comments,  etc)  every  week  (10/15/10)  

2011-10-27 CIKM 2011 Invited Talk 3

Page 4: CIKM 2011 Social Computing Industry Invited Talk

Google+  Stats  n  40  million  people  joined  Google  since  launch.  n  People  are  2x-­‐3x  times  more  likely  to  share  content  with  

one  of  their  circles  than  to  make  a  public  post.  

2011-10-27 CIKM 2011 Invited Talk 4

Page 5: CIKM 2011 Social Computing Industry Invited Talk

Social  Stream  Research  n  Analytics  

–  Factors  impacting  retweetability  [Suh  et  al,  IEEE  Social  Computing  2010]  

–  Location  field  of  user  profiles  [Hecht  et  al,  CHI  2011]  –  Organic  Q&A  behaviors  [Paul  et  al,  ICWSM’11]  

–  Languages  used  in  Twitter  [Hong  et  al,  ICWSM’11]  

n  Improving  Stream  Experience  

–  Topic-­‐based  summarization  &  browsing  of  tweets  [Bernstein  et  al,  UIST2010]  

–  Tweet  recommendation  [Chen  et  al,  CHI2010  &  CHI2011]  

2011-10-27 CIKM 2011 Invited Talk 5

Page 6: CIKM 2011 Social Computing Industry Invited Talk

Invisible  Brokerage  Signals  across  Language  Barriers  

Joint  work  w/  Lichan  Hong,  Gregorio  Convertino    [Hong  et  al.,  ICWSM  July  2011]    

2011-10-27 CIKM 2011 Invited Talk 6

Page 7: CIKM 2011 Social Computing Industry Invited Talk

Motivation  for  Studying  Languages  

n  Twitter  is  an  international  phenomenon  –  Most  research  focused  on  English  users  

–  Question  about  generalization  to  non-­‐English  

–  Understand  cross-­‐language  usage  differences  –  Design  implications  for  international  users  

n  Research  Questions:  –  What  is  the  language  distribution  in  Twitter?  

–  How  do  users  of  different  languages  use  Twitter?  

–  How  do  bilingual  users  spread  information  across  languages?  

 2011-10-27 CIKM 2011 Invited Talk 7

Page 8: CIKM 2011 Social Computing Industry Invited Talk

Data  Collection  &  Processing  

 104  languages    

04/18/10-­‐05/16/10  (4  weeks)    

 62M  tweets  

Google  Language  API  &  LingPipe  

 Twitter  stream  

Top  10  languages  

2011-10-27 CIKM 2011 Invited Talk 8

Page 9: CIKM 2011 Social Computing Industry Invited Talk

Top  10  Languages  in  Twitter  

   Language            Tweets          %            Users  

English   31,952,964   51.1   5,282,657  

Japanese   11,975,429   19.1   1,335,074  

Portuguese   5,993,584   9.6   993,083  

Indonesian   3,483,842   5.6   338,116  

Spanish   2,931,025   4.7   706,522  

Dutch   883,942   1.4   247,529  

Korean   754,189   1.2   116,506  

French   603,706   1.0   261,481  

German     588,409   1.0   192,477  

Malay   559,381   0.9   180,147  2011-10-27 CIKM 2011 Invited Talk 9

Page 10: CIKM 2011 Social Computing Industry Invited Talk

Human-­‐Coding  Study  n  2,000  random  tweets  from  62M  tweets  

n  2  human  judges  for  each  of  top  1o  languages    –  native  speakers  or  proficient  –  discuss  to  resolve  disagreement  

n  Hard  to  find  Indonesian  &  Malay  judges  

n  Presented  2,000  tweets  to  each  judge  

n  Judge  selected  tweets  in  his/her  language  

2011-10-27 CIKM 2011 Invited Talk 10

Page 11: CIKM 2011 Social Computing Industry Invited Talk

Machine  vs.  Human  

   Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa  

English   974   971   20   35   0.95  Japanese   370   1,595   0   35   0.94  Portuguese   170   1,803   19   8   0.92  Indonesian   106   1,875   15   4   0.91  Spanish   96   1,889   11   4   0.92  Dutch   18   1,978   2   2   0.90  Korean   24   1,976   0   0   1.00  French   13   1,980   0   7   0.79  

German     12   1,979   2   7   0.72  

Malay   8   1,979   4   9   0.55  

T-­‐P:  true  positive,  T-­‐N:  true  negative,  F-­‐N:  false-­‐negative,  F-­‐P:  false  positive  

2011-10-27 CIKM 2011 Invited Talk 11

Page 12: CIKM 2011 Social Computing Industry Invited Talk

Accuracy  of  Language  Detection  

n  Two  Types  of  Errors  

–  Got  ur  dirct  msg.i’m  lukng  4wrd  2  twt  wit  u  too.so,wat  doing  ha…(detected  as  Afrikaans)  

–  High  error  rate  for  tweets  of  1~2  words  

2011-10-27 CIKM 2011 Invited Talk 12

Page 13: CIKM 2011 Social Computing Industry Invited Talk

Machine  vs.  Human  

   Language            T-­‐P        T-­‐N        F-­‐N      F-­‐P              Cohen’s  Kappa  

French   13   1,980   0   7   0.79  

German     12   1,979   2   7   0.72  

Malay   8   1,979   4   9   0.55  

•  French:  5/7  F-­‐P  have  2  words  

•  German:  1/2  F-­‐N  has  1  word;  6/7  F-­‐Ps  are  in  English  

•  Malay:  3/4  F-­‐Ns  &  7/9  F-­‐Ps  are  in  Indonesian  

2011-10-27 CIKM 2011 Invited Talk 13

Page 14: CIKM 2011 Social Computing Industry Invited Talk

Common  Twitter  Conventions  hashtag  

URL  mention  

reply  (per-­‐tweet  metadata)  

retweet  2011-10-27 CIKM 2011 Invited Talk 14

Page 15: CIKM 2011 Social Computing Industry Invited Talk

Use  of  URLs  in  62M  Tweets  

   Language    URLs  

All   21%  

English   25%  

Japanese   13%  

Portuguese   13%  

Indonesian   13%  

Spanish   15%  

Dutch   17%  

Korean   17%  

French   37%  

German     39%  

Malay   17%  

n  Chi  Square  tests  confirmed  that  differences  by  language  are  significant.  

2011-10-27 CIKM 2011 Invited Talk 15

Page 16: CIKM 2011 Social Computing Industry Invited Talk

Significant  Cross-­‐Language  Differences      Language    URLs   Hashtags   Mentions   Replies    Retweets  

All   21%   11%   49%   31%   13%  

English   25%   14%   47%   29%   13%  

Japanese   13%   5%   43%   33%   7%  

Portuguese   13%   12%   50%   32%   12%  

Indonesian   13%   5%   72%   20%   39%  

Spanish   15%   11%   58%   39%   14%  

Dutch   17%   13%   50%   35%   11%  

Korean   17%   11%   73%   59%   11%  

French   37%   12%   48%   36%   9%  

German     39%   18%   36%   25%   8%  

Malay   17%   5%   62%   23%   29%  

Chi  Square  tests  confirmed  that  differences  by  language  are  significant  

2011-10-27 CIKM 2011 Invited Talk 16

Page 17: CIKM 2011 Social Computing Industry Invited Talk

Implications      Language    URLs    Hashtags    Mentions    Replies    Retweets  

All   21%   11%   49%   31%   13%  

Korean   17%   11%   73%   59%   11%  

German     39%   18%   36%   25%   8%  

n  Use  of  Twitter  for  social  networking  vs.  information  sharing  different  in  different  languages  

n  Design  of  recommendation  engines  –  Korean  users:  promote  conversational  tweets  –  German  users:  promote  tweets  with  URLs  

2011-10-27 CIKM 2011 Invited Talk 17

Page 18: CIKM 2011 Social Computing Industry Invited Talk

Studying  Bilingual  Brokers  n  Importance  of  brokers  

–  Structural  holes  (Burt’92),  LiveJournal  (Herring  et  al’07)  

n  Define  bilingual  brokers  as  Users  who  tweeted  in  a  pair  of  languages  

n  Caveat  

–  Under-­‐estimated  due  to  4-­‐week  time  limit  

–  Over-­‐estimated  due  to  language  detection  errors  

2011-10-27 CIKM 2011 Invited Talk 18

Page 19: CIKM 2011 Social Computing Industry Invited Talk

Number  of  Bilingual  Brokers  E   J   P   I   S   D   K   F   G  

J   140,730  

P   488,545   13,228  

I   230,023   4,825   29,405  

S   359,117   10,139   112,524   36,068  

D   150,041   6,383   30,855   34,906   30,916  

K   19,722   6,384   906   2,014   1,109   972  

F   194,931   10,463   53,607   34,586   49,445   33,568   1,244  

G   110,748   6,053   22,106   21,471   21,989   22,162   786   24,763  

M   148,365   4,208   31,184   135,427   31,967   29,331   1,518   30,257   18,301  

2011-10-27 CIKM 2011 Invited Talk 19

Page 20: CIKM 2011 Social Computing Industry Invited Talk

Sharing  URLs  Across  Languages  E   J   P   I   S   D   K   F   G   M  

E 3,013   18,399   985   4,986   1,144   212   1,791   1,647   540  

J   3,013   77   37   58   29   43   59   46   18  

P 18,399   77   74   1,644   198   2   453   168   123  

I   985   37   74   67   64   1   53   38   279  

S 4,986   58   1,644   67   139   0   286   139   53  

D 1,144   29   198   64   139   2   112   126   48  

K 212   43   2   1   0   2   3   3   1  

F   1,791   59   453   53   286   112   3   157   53  

G 1,647   46   168   38   139   126   3   157   40  

M 540   18   123   279   53   48   1   53   40  

2011-10-27 CIKM 2011 Invited Talk 20

Page 21: CIKM 2011 Social Computing Industry Invited Talk

Sharing  Hashtags  Across  Languages  

E   J   P   I   S   D   K   F   G   M  

E   8,178   33,197   14,969  

27,284   6,685   798   9,410   7,208   5,517  

J   8,178   331   135   351   218   149   352   260   100  

P   33,197   331   535   4,682   604   13   1,231   580   400  

I   14,969   135   535   762   684   25   713   415   6,046  

S   27,284   351   4,682   762   819   28   1,468   708   463  

D   6,685   218   604   684   819   26   851   769   424  

K   798   149   13   25   28   26   25   18   20  

F   9,410   352   1,231   713   1,468   851   25   879   411  

G   7,208   260   580   415   708   769   18   879   265  

M   5,517   100   400   6,046   463   424   20   411   265  

2011-10-27 CIKM 2011 Invited Talk 21

Page 22: CIKM 2011 Social Computing Industry Invited Talk

Implications  n  Indicators  of  connection  strength  between  

languages  –  Number  of  bilingual  brokers  –  Acts  of  brokerage:  sharing  URLs  &  hashtags  

n  English  well  connected  to  others,  and  may  function  as  a  hub  

n  Need  to  improve  cross-­‐language  communications  

? �2011-10-27 CIKM 2011 Invited Talk 22

Page 23: CIKM 2011 Social Computing Industry Invited Talk

Visible  Social  Signals  from    Shared  Items  

   

Kudos  to  Jilin  Chen,  Rowan  Nairn    

[Chen  et  al,  CHI2010]  [Chen  et  al.,  CHI2011]  

2011-10-27 CIKM 2011 Invited Talk 23

Page 24: CIKM 2011 Social Computing Industry Invited Talk

Eddi:  Summarizing  Social  Streams  

2011-10-27 CIKM 2011 Invited Talk 24

Page 25: CIKM 2011 Social Computing Industry Invited Talk

Information  Gathering/Seeking  n  The  Filtering  Problem:  

–  “I  get  1,000+  items  in  my  stream  daily  but  only  have  time  to  read  10  of  them.  Which  ones  should  I  read?”  

n  The  Discovery  Problem:  –  “There  are  millions  of  URLs  posted  daily  on  Twitter.  Am  I  

missing  something  important  there  outside  my  own  Twitter  stream?”  

2011-10-27 CIKM 2011 Invited Talk 25

Page 26: CIKM 2011 Social Computing Industry Invited Talk

n  Zerozero88.com  –  Twitter  as  the  platform  –  URLs  as  the  medium  –  Produces  your  

personal  headlines  

Stream  Recommender  

2011-10-27 CIKM 2011 Invited Talk 26

Page 27: CIKM 2011 Social Computing Industry Invited Talk

URL Sources

Topic Relevance Scores

Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs

Social Network Scores

User Topic Profiles

Local Social Network

2011-10-27 CIKM 2011 Invited Talk 27

Page 28: CIKM 2011 Social Computing Industry Invited Talk

URL  Sources  

n  Considering  all  URLs  was  impossible  n  FoF:  URLs  from  followee-­‐of-­‐followees  

–  Social  Local  News  is  Better  

n  Popular:  URLs  that  are  popular  across  whole  Twitter  –  Popular  News  is  Better  

Component Possible Design Choices

URL Sources FoF (followee-of-followees) Popular

2011-10-27 CIKM 2011 Invited Talk 28

Page 29: CIKM 2011 Social Computing Industry Invited Talk

URL Sources

Topic Relevance Scores

Social Network Scores

User Topic Profiles

Local Social Network

Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs

2011-10-27 CIKM 2011 Invited Talk 29

Page 30: CIKM 2011 Social Computing Industry Invited Talk

Topic  Relevance  Scores  

Funny YouTube Video

1.3 5.5 0.5

Funny Game …

4.0 2.1 …

2011-10-27 CIKM 2011 Invited Talk 30

Page 31: CIKM 2011 Social Computing Industry Invited Talk

n  Built  from  tweets  that  contain  the  URL  n  However,  tweets  are  short    

–  term  vectors  for  URLs  are  often  too  sparse  

n  Adopt  a  term  expansion  technique  using  a  search  engine  

“Best  of  Show  CES  2011:  The  Motorola  Atrix      http://tcrn.ch/e0g3Oh”  

Topic  Profile  of  URLs  

smartphone, mobility, …

Add to Profile

2011-10-27 CIKM 2011 Invited Talk 31

Page 32: CIKM 2011 Social Computing Industry Invited Talk

Topic  Profile  of  Users  

n  Self-­‐Topic:  content  profile  based  on  my  posts  –  My  Interest  as  Information  Producer  

n  Followee-­‐Topic:  content  profile  based  on  my  followees’  posts  –  My  Interest  as  Information  Gatherer  

n  None,  for  comparison  purpose  

Component Possible Design Choices

Topic Relevance Scores

Self-Topic Followee-Topic None

2011-10-27 CIKM 2011 Invited Talk 32

Page 33: CIKM 2011 Social Computing Industry Invited Talk

My  Followees  Profile

Profile

Profile

Profile

Profile

Profile

Profile

Profile

Profile Collect & Profile

Find Top Key Terms

Aggregate Profile

Profile

A term is weighted higher in your profile if more of your followees have the term as their top key terms

Terms

Terms

Terms

Terms

Terms

Terms

Terms

Terms

Terms

Terms

2011-10-27 CIKM 2011 Invited Talk 33

Page 34: CIKM 2011 Social Computing Industry Invited Talk

URL Sources

Topic Relevance Scores

Social Network Scores

User Topic Profiles

Local Social Network

Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs

2011-10-27 CIKM 2011 Invited Talk 34

Page 35: CIKM 2011 Social Computing Industry Invited Talk

Social  Network  Scores  

n  “Popular  Vote”  in  among  my  followees-­‐of-­‐followees  –  People  “vote”  a  URL  by  tweeting  it  –  URLs  with  more  votes  in  total  are  assigned  higher  score  –  Votes  are  weighted  using  social  network  structure  

n  None,  for  comparison  purpose  

Component Possible Design Choices

Social Network Scores

Social Voting None

2011-10-27 CIKM 2011 Invited Talk 35

Page 36: CIKM 2011 Social Computing Industry Invited Talk

The  Intuition:  Local  Influence  

Me  

Whose URLs should be weighted higher?

15 People

5 People

follows

follows

follow

follow

2011-10-27 CIKM 2011 Invited Talk 36

Page 37: CIKM 2011 Social Computing Industry Invited Talk

Possible  Recommender  Designs  

Component Possible Design Choices

URL Sources FoF (followee-of-followees) Popular

Topic Relevance Scores

Self-Topic Followee-Topic None

Social Network Scores

Social Voting None

•  2 (URL source) x 3 (topic score) x 2 (social score) = 12 possible algorithm designs in total"

•  Random selection if for both scores we chose None"

Recommendation Engine Ø Multiply scores Ø Rank URLs using multiplied scores Ø Recommend highest ranked URLs

2011-10-27 CIKM 2011 Invited Talk 37

Page 38: CIKM 2011 Social Computing Industry Invited Talk

Study  Design  n  Within-­‐subject  design  n  Each  subject  evaluated  5  URL  recommendations  

from  each  of  the  12  algorithms  –  Show  60  URLs  in  random  order,  and  ask  for  binary  rating  

–  60  ratings  x  44  subjects  =  2640  ratings  in  total  

Page 39: CIKM 2011 Social Computing Industry Invited Talk

Best Performing

Social Vote Only

FoF URLs

39

Summary  of  Results  

Popular URLs

2011-10-27 CIKM 2011 Invited Talk 39

Page 40: CIKM 2011 Social Computing Industry Invited Talk

Algorithms  Differ  Not  Only  in  Accuracy!  

n  Relevance  vs.  Serendipity  in  recommendations  

n  From  a  subject  in  the  pilot  interview  of  zerozero88:  

–  “There  is  a  tension  between  the  discovery  and  the  affirming  aspect  of  things.  I  am  getting  tweets  about  things  that  I  am  already  interested  in.  Something  I  crave  …,  is  an  element  of  surprise  or  whimsy.  ...  I  am  getting  a  lot  of  things  I  am  interested  in,  but  that  is  not  necessarily  a  good  thing  for  me  personally”  

2011-10-27 CIKM 2011 Invited Talk 40

Page 41: CIKM 2011 Social Computing Industry Invited Talk

Design  Rule  

n  Interaction  costs  determine  number  of  people  who  participate  –  Surplus  of  attention  &  

motivation  at  small  transaction  costs  

n  Therefore:    n  Important  to  keep  

interaction  costs  low  –  Recommendation  –  Summarization  

n  Or  bring  new  benefits  Cost of participation

# Pe

ople

will

ing

to p

artic

ipat

e

2008-05-13 CSCL 2011 Keynote

Page 42: CIKM 2011 Social Computing Industry Invited Talk

Thank  you!  n  [email protected]  n  http://edchi.net  

2011-10-27 CIKM 2011 Invited Talk 42