39
LinkedIn Endorsements: Reputation, Virality, and Social Tagging O’Reilly Strata - February 28, 2013 Sam Shah @sam_shah Pete Skomoroch @peteskomoroch ©2012 LinkedIn Corporation. All Rights Reserved.

LinkedIn Endorsements: Reputation, Virality, and Social Tagging

Embed Size (px)

DESCRIPTION

Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest “data features” in LinkedIn’s portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. In this talk, we’ll examine the practical aspects of building a data feature like Endorsements. We’ll talk about marrying product design and data, deep diving into several of the lessons we’ve learned along the way - all using skills & endorsements as an empirical case study. We’ll include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users.

Citation preview

Page 1: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

©2012 LinkedIn Corporation. All Rights Reserved.

LinkedIn Endorsements: Reputation, Virality, and Social TaggingO’Reilly Strata - February 28, 2013

Sam Shah @sam_shah

Pete Skomoroch @peteskomoroch

Page 2: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

©2012 LinkedIn Corporation. All Rights Reserved.

Sam ShahPrincipal Engineer and Engineering Manager

@sam_shahwww.linkedin.com/in/shahsam

Peter SkomorochPrincipal Data Scientist@peteskomoroch

www.linkedin.com/in/peterskomoroch

Page 3: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

©2012 LinkedIn Corporation. All Rights Reserved.

3

LinkedIn: The Professional Profile of Record

200+MMembers 200M MemberProfiles

Page 4: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

4

LinkedIn’s Latest Data Product: Skill Endorsements

Page 5: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

5

Viral Growth: 800M Endorsements in 4 Months

Page 6: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

6

Data Amplifies Desire

1. Desire + Social Proof

2. Viral Loops + Network Effects

3. Data Foundation + Recommendation Algorithms

Page 7: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

7

1) Desire & Social Proof

Page 8: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

A endorses

B

B notified

B “accepts” endorsement

B endorses

C

B endorses

D

Endorsement recommendations

Email NotificationNews Feed2) Viral Loops & Network Effects

Page 9: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

9

3) Data Foundation: Skills & Suggested Skills

Page 10: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

10

Data Foundation: LinkedIn Skills

Page 11: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

Social Tagging Accelerates Adoption

Suggested endorsements

Skill recommendations

Skill marketing

©2012 LinkedIn Cororation. All Rights Reserved.

Virality only

Page 12: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

12

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 13: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

13

Unsupervised Topic Discovery from Profiles

Extract

Page 14: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

14

What is the skills dictionary?

– A growing taxonomy of skills

– Generated by mining profiles and maintained by the Skills team at LinkedIn

– Created using clustering and crowdsourcing.

– Multiple phrases, acronyms, and misspellings map to a single standardized skill.

250+ different phrases map to “Microsoft Office”

Building the Skills DictionaryProfile(specialties)

Tokenization

Clustering

Crowdsourcing

Taxonomy

Page 15: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

15

Topic Clustering & Phrase Sense Disambiguation

Page 16: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

16

– ms office– ms office suite– computer skills including ms office– office 97– microsoft office user– mac office– microsoft office 2003 & 2007– microsoft office suits– microsoft ofice– microsoft ofiice– ms office certified– office 98– …

Skills Dictionary: Microsoft Office

Microsoft Office

(Skill ID = 366)

Page 17: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

17

Deduplication Signals from Mechanical Turk

Page 18: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

18

Sample Task for Mechanical Turk Workers

Page 19: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

19

Skill Phrase Deduplication

Page 20: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

20

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 21: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

21

Skills Classification

Use skill dictionary metadata to tag, standardize and infer skills Run classifiers for each skill on member profiles

Public Speaking

Ruby on Rails

Entrepreneurship

Microsoft Office

AP Style

Page 22: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

22

Lead designer and engineer for the implementation of a user-centric, fully-configurable UI for data aggregation and reporting.Developed over 20 SaaS custom applications using Python, Javascript and RoR.

Tagging Skill Phrases

Tagging: Extract potential skill phrases from text

Standardize unambiguous phrase variants

JavaScript RoR SaaS Python

ror

rubyonrails

ruby on rails development

ruby rails

ruby on rail

Ruby on Rails

Document (ex: Profile)

Tokenization

Skills Tagger

Phrases

(up to 6 words)

Skills Classifier

Skills

(unordered)

Skills

(ranked by relevance)

Page 23: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

23

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 24: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

24

The skills classifier computes the likelihood of a member to have a skill based on the member’s profile, other profiles which share common attributes and their connections.

Skills Classification on Member Profiles

Tagging

Tokenize free

text into phrase tags

Standardization

Transform tags

into potential skills

Inference

Rank skills by

likelihood

Profile

text

Profile attributes & network signals

Page 25: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

25

Skill Inference

How suggested/inferred skills work:

– Profiles with skills help build a massive dataset of (attribute: skills).

Example with a title:

Profile

Extract attributes

- Company ID

- Title ID

- Groups ID

- Industry ID

- …

Skills Classifier

Skills

(ranked by likelihood)

Feature

Vectors

Software Engineer Java100 000

Software Engineer C++ 88 000

… Title Skill Occurrences

Page 26: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

26

Skill Inference

How suggested/inferred skills work:

– The skill likelihood is a conditional model

– Probabilities are combined using a Naïve Bayes Classifier

If you are an engineer at Apple, you probably know about iPhone Development.

Profile

Extract attributes

- Company ID

- Title ID

- Groups ID

- Industry ID

- …

Skills Classifier

Skills

(ranked by likelihood)

Feature

Vectors

Page 27: LinkedIn Endorsements: Reputation, Virality, and Social Tagging
Page 28: LinkedIn Endorsements: Reputation, Virality, and Social Tagging
Page 29: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

29

Skill Suggestions for Your LinkedIn Profile

49% Conversion

4% Conversion

Page 30: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

30

Outline

Skill discovery

Skill tagging

Skill recommendations

Suggested endorsements

Page 31: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

31

Social Tagging via Skill Endorsements

Page 32: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

32

Suggesting Endorsements

People-skill combinations in a member’s network Binary classification Features

– Skill inference score– Company overlap– School overlap– Group overlap– Industry and functional area similarity– Title similarity– Site interactions– Co-interactions

Candidategeneration

- Company

- Title

- Groups

- Industry

- …

Classifier

Suggested Endorsements

(ranked by likelihood)

Feature

Vectors

Page 33: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

Social Tagging Accelerates Adoption

Skill endorsements

Skill recommendations

Skill marketing

©2012 LinkedIn Cororation. All Rights Reserved.

Page 34: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

34

Can We Find Influencers In Venture Capital?

Page 35: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

35

Which Skills Are Important for a Data Scientist?

Page 36: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

36

What Technologies are Professionals Adopting?

Page 37: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

37

Data Amplifies Desire

1. Desire + Social Proof

2. Viral Loops + Network Effects

3. Data Catalyst + Recommendation Algorithms

Page 38: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

©2012 LinkedIn Corporation. All Rights Reserved.

38

Infrastructure

• Apache Hadoop: Parallel processing architecture• Apache Kafka: Ingress pipes• Azkaban: Hadoop scheduler• Voldemort: Egress database• Apache Pig: High-level MR language • DataFu: Convenience routines

http://data.linkedin.com

R. Sumbaly, J. Kreps, and S. Shah. “The ‘Big Data’ ecosystem at LinkedIn”. In SIGMOD 2013 (to appear).

Page 39: LinkedIn Endorsements: Reputation, Virality, and Social Tagging

data.linkedin.comLearning More