73
Integra(ng Social with Search Edward Chang Director, Google Research, Beijing WISE Keynote 1 2010-12-13

Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Integra(ng  Social  with  Search  

Edward  Chang  

Director,  Google  Research,  Beijing�

WISE Keynote 1 2010-12-13

Page 2: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Related  Papers �•  AdHeat  (Social  Ads):  

–  AdHeat:  An  Influence-­‐based  Diffusion  Model  for  PropagaAng  Hints  to  Match  Ads,    H.J.  Bao  and  E.  Y.  Chang,  WWW  2010    (best  paper  candidate),  April  2010.  

–  Parallel  Spectral  Clustering  in  Distributed  Systems,    Wen-­‐Yen  Chen,  Yangqiu  Song,  Hongjie  Bai,  Chih-­‐Jen  Lin,  and  E.  Y.  Chang,    IEEE  TransacAons  on  PaTern  Analysis  and  Machine  Intelligence  (PAMI),  2010.  

•  UserRank:  –  Confucius  and  its  Intelligent  Disciples,  X.  Si,  E.  Y.  Chang,  Z.  Gyongyi,  VLDB,  September  2010  .  

–  Topic-­‐dependent  User  Rank,  Xiance  Si,  Z.  Gyongyi,  E.  Y.  Chang,  and  M.S.  Sun,  Google  Technical  Report.  

•  Large-­‐scale  Collabora;ve  Filtering:  –  PLDA+:  Parallel  Latent  Dirichlet  AllocaAon  for  Large-­‐Scale  ApplicaAons,  ACM  TransacAons  on  

Internet  Technology,  2011.  –  CollaboraAve  Filtering  for  Orkut  CommuniAes:  Discovery  of  User  Latent  Behavior,  W.-­‐Y.  Chen,  

J.  Chu,  E.  Y.  Chang,  WWW  2009:  681-­‐690.  –  CombinaAonal  CollaboraAve  Filtering  for  Personalized  Community  RecommendaAon,  W.-­‐Y.  

Chen,  E.  Y.  Chang,  KDD  2008:  115-­‐123.  –  PSVM:  Parallelizing  SVMs  on  distributed  machines,  E.  Y.  Chang,  et  al.,  NIPS  2007.  

WISE Keynote 2 2010-12-13

Page 3: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Web  1.0  

WISE Keynote 3

.htm

.htm

.htm

.jpg

.jpg

.doc

.htm

.msg

.htm

.htm

2010-12-13

Page 4: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Web  2.0  -­‐-­‐-­‐  Web  with  People  

WISE Keynote 4

.htm

.jpg

.doc

.xls

.msg .htm

.htm

.jpg

.msg

.htm

2010-12-13

Page 5: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 5 2010-12-13

Page 6: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 6 2010-12-13

Page 7: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Outline�

•  Search  +  Social  Synergy  •  Search    Social  

•  Social    Search  

•  Scalability�

WISE Keynote 7 2010-12-13

Page 8: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Google  Q&A  (Confucius)  

•  Developed  from  2007  All  now  @  Beijing  •  Launched  in  more  than  60  courAers  

– Russia  – HK  – Southeast  Asia  – Arab  World  

– Sub-­‐Saharan  Africa  (Baraza)  

WISE Keynote 8 2010-12-13

Page 9: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 9

Query:  What  are  must-­‐see  a<rac(ons  at  Yellowstone  

2010-12-13

Page 10: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 10

Query:  What  are  must-­‐see  a<rac(ons  at  Yellowstone  

2010-12-13

Page 11: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 11

Query:  What  are  must-­‐see  a<rac(ons  at  Yosemite  

2010-12-13

Page 12: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 12

Query:  What  are  must-­‐see  a<rac(ons  at  Beijing  

Hotel  ads  

2010-12-13

Page 13: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 13

Search Quality at Stake.

Strategic  Issue  

13  17  June  2008  -­‐  Knowledge  Search  Strategy  

%  of  Referral  Traffic  From  1st  Page  Sent  to  Yahoo  /  Baidu  %  of  First  Result  Page  with  >=1  Q&A  Result  from  Yahoo  or  Baidu  

61  countries  have  Q&A  or  advanced  forums  as  top  10  most  clicked  des;na;on    (out  of  115  countries  with  more  than  1M  session)  

2010-12-13

Page 14: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

SNS  &  Mobile  Also  Need  Q&A  

•  Social  Networks  – Difficult  to  find  user  intent  to  match  ads  

– Q&A  is  a  perfect  app  to  learn  users’  problems  

•  Mobile  Search  – Voice  is  the  most  convenient  user  interface  – Succinct  search  result  (or  rich  snippets)  is  desirable    

WISE Keynote 14 2010-12-13

Page 15: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Confucius:  Google  Q&A  Providing  High-­‐Quality  Answers  in  a  Timely  Fashion  

  Trigger  a  discussion/quesAon  session  during  search    Provide  labels  to  a  post  (semi-­‐automaAcally)    Given  a  post,  find  similar  posts  (automaAcally)    Evaluate  quality  of  a  post,  relevance  and  originality    Evaluate  user  credenAals  in  a  topic  sensiAve  way    Route  quesAons  to  experts    Provide  most  relevant,  high-­‐quality  content  for  Search  to  index    Generate  answers  using  NLP  

WISE Keynote 15

U  

Search  

S  U  

Community  

C  

CCS/Q&A  

Discussion/Ques;on  

DC  

Differen;ated  Content  

2010-12-13

Page 16: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Confucius:  Google  Q&A  Providing  High-­‐Quality  Answers  in  a  Timely  Fashion  

  Trigger  a  discussion/quesAon  session  during  search    Provide  labels  to  a  quesAon  (semi-­‐automaAcally)    Given  a  quesAon,  find  similar  quesAons  (automaAcally)    Evaluate  quality  of  an  answer,  relevance  and  originality    Evaluate  user  credenAals  in  a  topic  sensiAve  way    Route  quesAons  to  experts    Provide  most  relevant,  high-­‐quality  content  for  Search  to  index    Generate  answers  using  NLP  

WISE Keynote 16

U  

Search  

S  U  

Community  

C  

CCS/Q&A  

Discussion/Ques;on  

DC  

Differen;ated  Content  

2010-12-13

Page 17: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 17

Q&A  Uses  Machine  Learning  

Label suggestion using LDA algorithm.

• Real Time topic-to-topic (T2T) recommendation using LDA algorithm.

• Gives out related high quality links to previous questions before human answer appear.

2010-12-13

Page 18: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 18

1 1 1 1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1 1

1 1 1 1 1 1 1 1 1

1 1 1

Que

stio

ns

Labels/Qs

Based on membership so far, and memberships of others

Predict further membership

CollaboraAve  Filtering  

2010-12-13

Page 19: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

FIM-­‐based  RecommendaAon  

WISE Keynote 19 2010-12-13

Page 20: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Distributed  Latent  Dirichlet  AllocaAon  (LDA)  

WISE Keynote 20

•  Other  CollaboraAve  Filtering  Apps  –  Recommend  Users    Users  –  Recommend  Music    Users  –  Recommend  Ads    Users  –  Recommend  Answers    Q  

•  Predict  the  ?  In  the  light-­‐blue  cells  

Users/Music/Ads/Question

Use

rs/M

usic

/Ads

/Ans

wer

s

1 recipe pastry for a 9 inch double crust 9 apples, 2/1 cup, brown sugar

How to install apps on Apple mobile phones?

Documents

Topic Distribution

Topic Distribution

User quries iPhone  crack Apple  pie  

•  Search  –  Construct  a  latent  layer  for  beTer  

for  semanAc  matching  

•  Example:  –  iPhone  crack  –  Apple  pie  

2010-12-13

Page 21: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Latent  Dirichlet  AllocaAon  [D.  Blei,  M.  Jordan  04]  

•  α:  uniform  Dirichlet  φ  prior  for  per  document  d  topic  distribuAon  (corpus  level  parameter)  

•  β:  uniform  Dirichlet  φ  prior  for  per  topic  z  word  distribuAon  (corpus  level  parameter)  

•  θd  is  the  topic  distribuAon  of  document    d  (document  level)  

•  zdj  the  topic  if  the  jth  word  in  d,  wdj  the  specific  word  (word  level)    

WISE Keynote 21

θ

z

w

Nm

M

α

βφ

K

2010-12-13

Page 22: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

CombinaAonal  CollaboraAve  Filtering  Model  (CCF)  [W.-­‐Y.  Chen,  et  al,  KDD2008]  

WISE Keynote 22

Communities

users descriptions users descriptions

Communities Communities

Word 1

Word 2

Word 3

2010-12-13

Page 23: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 23 2010-12-13

Page 24: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Confucius:  Google  Q&A  

  Trigger  a  discussion/quesAon  session  during  search    Provide  labels  to  a  post  (semi-­‐automaAcally)    Given  a  post,  find  similar  posts  (automaAcally)    Evaluate  quality  of  a  post,  relevance  and  originality    Evaluate  user  credenAals  in  a  topic  sensiAve  way    Route  quesAons  to  experts    Provide  most  relevant,  high-­‐quality  content  for  Search  to  index    NLQA    

WISE Keynote 24

U  

Search  

S  U  

Community  

C  

CCS/Q&A  

Discussion/Ques;on  

DC  

Differen;ated  Content  

2010-12-13

Page 25: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

UserRank  

WISE Keynote 25

•  Rank  users  by  quanAty  (number  of  links)  and  quality  (weights  on  links)  of  contribuAons  

•  Quality  include:  –  Relevance.  Is  an  answer  relevant  to  the  

Q?  Measured  by  KL  divergence  between  latent-­‐topic  vectors  of  A  and  Q    

–  Coverage.  Compared  among  different  answers  

–  Originality.  Detect  potenAal  plagiarism  and  spam  

–  Promptness.  Time  between  Q  and  A  posAng  Ame  

2010-12-13

Page 26: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Outline�

•  Search  +  Social  Synergy  •  Social    Search  

•  Search    Social  

•  Scalability   �

WISE Keynote 26 2010-12-13

Page 27: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Outline�

•  Search  +  Social  Synergy  •  Social    Search  

•  Search    Social  

•  Scalability�

WISE Keynote 27 2010-12-13

Page 28: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Social?  

•  ConnecAng  to  friends  •  Knowing  what  friends  are  up  to  •  ConnecAng  to  strangers  

– DaAng,  Gaming  – Shopping  

•  Making  recommendaAons  based  on  acAviAes  

WISE Keynote 28 2010-12-13

Page 29: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

User  Latent  Model  

•  α:  uniform  Dirichlet  φ  prior  for  per  user  u  interest  distribuAon  (populaAon  level  parameter)  

•  β:  uniform  Dirichlet  φ  prior  for  per  interest  z  acAvity  distribuAon  (populaAon  level  parameter)  

•  θd  is  the  interest  distribuAon  of  user  u  (user  level)  

•  zuj  the  interest  of  the  jth  acAvity  in  u,  wuj  the  specific  acAvity  (acAvity  level)    

WISE Keynote 29

θ

z

w

Nm

M

α

βφ

K

2010-12-13

Page 30: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 30 2010-12-13

Page 31: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

CombinaAonal  CollaboraAve  Filtering  Model  (CCF)  

WISE Keynote 31

Users

Keywords Videos/Photos Keywords Videos/Photos

Users Users

2010-12-13

Page 32: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Interest  Networks  

WISE Keynote 32 2010-12-13

Page 33: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Outline�

•  Search  +  Social  Synergy  •  Social    Search  

– Mobilize  users  to  improve  search  quality  – Google  Q&A,  Facebook  Like  

•  Search    Social  – Use  query  log  to  help  social  

•  AcAviAes    Interests    Social  •  Groupcom  

•  Scalability�

WISE Keynote 33 2010-12-13

Page 34: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

34 WISE Keynote 2010-12-13

Page 35: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

35 WISE Keynote 2010-12-13

Page 36: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

36 WISE Keynote 2010-12-13

Page 37: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

37 WISE Keynote 2010-12-13

Page 38: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

38 WISE Keynote 2010-12-13

Page 39: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

39 WISE Keynote 2010-12-13

Page 40: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

40 WISE Keynote 2010-12-13

Page 41: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

User  Latent  Model  

•  α:  uniform  Dirichlet  φ  prior  for  per  user  u  interest  distribuAon  (populaAon  level  parameter)  

•  β:  uniform  Dirichlet  φ  prior  for  per  interest  z  acAvity  distribuAon  (populaAon  level  parameter)  

•  θd  is  the  interest  distribuAon  of  user  u  (user  level)  

•  zuj  the  interest  of  the  jth  acAvity  in  u,  wuj  the  specific  acAvity  (acAvity  level)    

WISE Keynote 41

θ

z

w

Nm

M

α

βφ

K

2010-12-13

Page 42: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

LDA  Gibbs  Sampling:  Inputs  &  Outputs  

WISE Keynote 42

users

words topics

words topics

users

Inputs:    1.   training  data:  users  as  bags  of  

words  2.  parameter:  the  number  of  

topics  

Outputs:    1. model  parameters:  a  co-­‐

occurrence  matrix  of  topics  and  words.  

2.   by-­‐product:  a  co-­‐occurrence  matrix  of  topics  and  users.  

2010-12-13

Page 43: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Parallel  Gibbs  Sampling  

WISE Keynote 43

Inputs:    1.   training  data:  users  as  bags  of  

words  2.  parameter:  the  number  of  

topics  

Outputs:    1. model  parameters:  a  co-­‐

occurrence  matrix  of  topics  and  words.  

2.   by-­‐product:  a  co-­‐occurrence  matrix  of  topics  and  users.  

users

words topics

words topics

users

2010-12-13

Page 44: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

ObservaAons  

WISE Keynote 44

Master Node

•  BoTleneck:  CommunicaAon    

•  Amdahl’s  law  caps  speedup  

•  Words  in  a  bag  have        no  order  

•  Words  on  a  computer  node  can  be  reordered  

2010-12-13

Page 45: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Example  Bags  /  Node  A  

•  Bag  #1          w1,  w2,  w3,  w1,  w2,  w3,  w1,  w2,  w3  •  Bag  #2          w1,  w2,  w1,  w2,  w1,  w2,  w1,  w2  •  Bag  #3          w3,  w1,  w3,  w1,  w3,  w1,  w3,  w1  

•  Bundle  #1    w1,  w1,  w1,  w1,  w1,  w1,  …  •  Bundle  #2    w2,  w2,  w2,…  ,    •  Bundle  #3    w3,  w3,  w3,…  

WISE Keynote 45 2010-12-13

Page 46: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Two  Nodes  

Node  A   Node  B  

W1   W2  

W2   W3  

W3   W1  

WISE Keynote 46 2010-12-13

Page 47: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Parallel  Gibbs  Sampling  

WISE Keynote 47

Inputs:    1.   training  data:  documents  as  

bags  of  words  2.  parameter:  the  number  of  

topics  

Outputs:    1. model  parameters:  a  co-­‐

occurrence  matrix  of  topics  and  words.  

2.   by-­‐product:  a  co-­‐occurrence  matrix  of  topics  and  documents.  

docs

words topics

words topics

docs

2010-12-13

Page 48: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

PLDA  -­‐-­‐  enhanced  parallel  LDA�

•  Take  advantage  of  bag  of  words  modeling:  each  Pw  machine  processes  vocabulary  in  a  word  order  

•  Pipelining:  fetching  the  updated  topic  distribuAon  matrix  while  doing  Gibbs  sampling  

WISE Keynote �48 2010-12-13

Page 49: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Speedup    1,500x  using  2,000  machines �

WISE Keynote 49 2010-12-13

Page 50: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Lessons  Learned  

•  BoTleneck  MaTers  •  Inter-­‐iteraAon  MaTers  

WISE Keynote 50 2010-12-13

Page 51: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

MapReduce�

WISE Keynote 51

map

map

map

map

map

Reduce

Data Block 1

Data Block 2

Data Block 3

Data Block 4

Data Block 5

Data datadatadatadatadatadatadatadatadatadata

Results datadatadatadatadatadata

2010-12-13

Page 52: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

 Parallel  Programming  Models  

WISE Keynote 52

MapReduce Project + MPI

GFS/IO and task rescheduling overhead between iterations

Yes No +1

No +1

Flexibility of computation model AllReduce only +0.5

? +1

Flexible +1

Efficient AllReduce Yes +1

Yes +1

Yes +1

Recover from faults between iterations

Yes +1

Yes +1

Apps

Recover from faults within each iteration

Yes +1

Yes +1

Apps

Final Score for scalable machine learning

3.5 5 4

2010-12-13

Page 53: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

53

SVM  BoTlenecks  Time  consuming  –  1M  dataset,  8  days  

Memory  consuming  –  1M  dataset,  10G  

... ... ..

.

WISE Keynote 2010-12-13

Page 54: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

54

Matrix  FactorizaAon  AlternaAves  

exact

approximate

WISE Keynote 2010-12-13

Page 55: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

55

Raw Data Matrix Multiplication ICF

Incremental Data

Kernel Matrix

Incremental Kernel Matrix

Incremental ICF

Incremental Matrix Multiplication

Incremental Linear System Solving

Parallelizing  SVM  [E.  Chang,  et  al,  NIPS  07]  

WISE Keynote 2010-12-13

Page 56: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

56

Incomplete  Cholesky  FactorizaAon  (ICF)  

n x n n x p p x n

WISE Keynote 2010-12-13

Page 57: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

57

Raw Data Matrix Multiplication ICF

Incremental Data

Kernel Matrix

Incremental Kernel Matrix

Incremental ICF

Incremental Matrix Multiplication

Incremental Linear System Solving

PSVM  

WISE Keynote 2010-12-13

Page 58: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

58

Matrix  Product  

=

p x n n x p p x p

WISE Keynote 2010-12-13

Page 59: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

59

PSVM  [E.  Chang,  et  al,  NIPS  07]  

•  Column-­‐based  ICF  – Slower  than  row-­‐based  on  single  machine  

– Parallelizable  on  mulAple  machines  

•  Changing  IPM  computaAon  order  to  achieve  parallelizaAon  

WISE Keynote 2010-12-13

Page 60: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

60

Overheads  

WISE Keynote 2010-12-13

Page 61: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

61

Speedup  

WISE Keynote 2010-12-13

Page 62: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Scalability  

•  ComputaAon  – ParallelizaAon  – ApproximaAon  

•  File  Systems  – Latency  – Recovery  

•  Power  Management  

WISE Keynote 62 2010-12-13

Page 63: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Sample  Plaxorms �

WISE Keynote 63 2010-12-13

Page 64: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Sample  Hierarchy�

•  Server  – 16GB  DRAM;  160MB  Flash;  5  x  1TB  disk  

•  Rack  – 40  servers  – 48  port  Gigabit  Ethernet  switch  

•  Warehouse  – 10,000  servers  (250  racks)  – 2K  port  Gigabit  Ethernet  switch �

64 WISE Keynote 2010-12-13

Page 65: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Storage  -­‐-­‐-­‐  One  Server�

65 WISE Keynote 2010-12-13

Page 66: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Storage  -­‐-­‐-­‐  One  Rack�

66 WISE Keynote 2010-12-13

Page 67: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Storage  -­‐-­‐-­‐  One  Center�

67 WISE Keynote 2010-12-13

Page 68: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Google  File  System  (GFS)  

•  Master  manages  metadata  •  Data  transfers  happen  directly  between  clients/chunkservers  •  Files  broken  into  chunks  (typically  64  MB)  •  Chunks  triplicated  across  three  machines  for  safety  •  See  SOSP^03  paper  at  hTp://labs.google.com/papers/gfs.html  

Rep

licas

Master GFS Master

GFS Master Client

Client

C1 C0 C0

C3 C3 C4

C1

C5

C3

C4

WISE Keynote 68 2010-12-13

Page 69: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote � 69 2010-12-13

Page 70: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Win  in  Scale�

•  Google  Translate  •  Voice  •  Trend  PredicAon    

– An  example  benefits  society�

WISE Keynote � 70 2010-12-13

Page 71: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

H1N1  United  NaAon  Report �

WISE Keynote � 71 2010-12-13

Page 72: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

Concluding  Remarks  •  Search  +  Social  •  Increasing  quanAty  and  complexity  of  data  demands  scalable  

soluAons  •  Have  parallelized  key  subrouAnes  for  mining  massive  data  sets  

–  Spectral  Clustering  [ECML  08]  –  Frequent  Itemset  Mining  [ACM  RS  08]  –  PLSA  [KDD  08]    –  LDA  [WWW  09,  AAIM  09]  –  UserRank  [Google  TR  09]  –  Support  Vector  Machines  [NIPS  07]  

•  Launched  Google  Q&A  (Confucius)  in  60+  countries  •  Relevant  papers  

–  hTp://infolab.stanford.edu/~echang/  •  Open  Source  PSVM,  PLDA  

–  hTp://code.google.com/p/psvm/  –  hTp://code.google.com/p/plda/  

WISE Keynote 72 2010-12-13

Page 73: Integra(ng)Social)with)Search) - Stanford Universityinfolab.stanford.edu/~echang/Wise2010-Keynote.pdf · Related&Papers • AdHeat’(Social’Ads): – AdHeat:&An&Influence9based&Diffusion&Model&for&PropagaAng&Hints&to&Match&Ads,&

WISE Keynote 73

Models  of  InnovaAon  •   Ivory  tower  

•  Only  consider  theory  but  not  applicaAon  

•   Build  it  and  they  will  come  •  ScienAsts  drives  product  development  

•   “Research  for  sale”  •  Research  funded  by:  

–  product  groups  or  customers  

•   Research  &  development  as  equals  •  Research  “sells”  innovaAon;    •  Product  “requests”  innovaAon  

•   Google-­‐style  innovaAon  

R

R

D

R

D R

R=D

D?

2010-12-13