(To see the animations, please download the presentation.) Endorsements are a one-click system to recognize someone for their skills and expertise on LinkedIn, the largest professional online social network. This is one of the latest “data features” in LinkedIn’s portfolio, and the endorsement ecosystem generates a large graph of reputation signals and viral user activity. Underneath this feature, there are several interesting and difficult data questions: 1. How do you automatically create a taxonomy of skills in the professional context? 2. How do you disambiguate between different contexts of skills? For instance, “search” could mean information retrieval, search & seizure, search & rescue, among others. 3. How can you leverage data to determine someone’s authoritativeness in a skill? 4. How do you use that authoritativeness to recommend people to endorse? 5. How do you optimize a complex large scale machine learning system for viral growth & engagement? In this talk, we’ll examine the practical aspects of building a data feature like Endorsements. We’ll talk about marrying product design and data, deep diving into several of the lessons we’ve learned along the way - all using skills & endorsements as an empirical case study. We’ll include technical detail on our approaches and how we combine crowdsourcing, machine learning, and large scale distributed systems to recommend topics to users. We’ll also show interesting results on how members are using the endorsements feature and how it’s spread across the network.
Citation preview
LinkedIn Endorsements: Reputation, Virality, andSocial
TaggingOReilly Strata - February 28, 2013Sam Shah @sam_shahPete
Skomoroch @peteskomoroch2012 LinkedIn Corporation. All Rights
Reserved.
Sam Shah Principal Engineer and Engineering Manager @sam_shah
www.linkedin.com/in/shahsam Peter Skomoroch Principal Data
Scientist @peteskomoroch www.linkedin.com/in/peterskomoroch2012
LinkedIn Corporation. All Rights Reserved.
LinkedIn: The Professional Profile of Record 200+M Members 200M
Member Profiles 2012 LinkedIn Corporation. All Rights Reserved.
3
LinkedIns Latest Data Product: Skill Endorsements 4
Viral Growth: 800M Endorsements in 4 Months 5
Data Amplifies Desire1. Desire + Social Proof2. Viral Loops +
Network Effects3. Data Foundation + Recommendation Algorithms
6
1) Desire & Social Proof 7
Email News Feed Notification2) Viral Loops & Network
Effects A B B accepts endorses notified endorsement B Endorsement
recommendations B B endorses endorses C D
3) Data Foundation: Skills & Suggested Skills 9
Data Foundation: LinkedIn Skills 10
Social Tagging Accelerates Adoption Skill marketingSkill
recommendations Virality only Suggested endorsements 2012 LinkedIn
Cororation. All Rights Reserved.
Unsupervised Topic Discovery from Profiles Extract 13
ProfileBuilding the Skills Dictionary (specialties) What is the
skills dictionary? A growing taxonomy of skills Tokenization
Clustering Generated by mining profiles and maintained by the
Skills team at LinkedIn Crowdsourcing Created using clustering and
crowdsourcing. Multiple phrases, acronyms, and misspellings map to
a single standardized skill. 250+ different phrases map to
Microsoft Office Taxonomy 14
Topic Clustering & Phrase Sense Disambiguation 15
Skills Dictionary: Microsoft Office ms office ms office suite
computer skills including ms office office 97 microsoft office user
Microsoft Office mac office microsoft office 2003 & 2007 (Skill
ID = 366) microsoft office suits microsoft ofice microsoft ofiice
ms office certified office 98 16
Skills Classification Use skill dictionary metadata to tag,
standardize and infer skills Run classifiers for each skill on
member profiles Public Speaking Ruby on Rails Entrepreneurship
Microsoft Office AP Style 21
Document Tagging Skill Phrases (ex: Profile) Tagging: Extract
potential skill phrases from text Lead designer and engineer for
the implementation of a user- centric, fully-configurable UI for
data aggregation and reporting. Developed over 20 SaaS custom
applications using Python, Javascript and RoR. Tokenization Phrases
JavaScript RoR SaaS Python (up to 6 words) Standardize unambiguous
phrase variants Skills Tagger ror rubyonrails Skills ruby on rails
development Ruby on Rails (unordered) ruby rails ruby on rail
Skills Classifier Skills (ranked by relevance) 22
Skills Classification on Member Profiles The skills classifier
computes the likelihood of a member to have a skill based on the
members profile, other profiles which share common attributes and
their connections. Tagging Standardization InferenceProfile
Tokenize free Transform tags Rank skills by text text into phrase
tags into potential skills likelihood Profile attributes &
network signals 24
ProfileSkill Inference How suggested/inferred skills work:
Extract Profiles with skills help build a massive dataset of
attributes (attribute: skills). Feature - Company ID Example with a
title: Vectors - Title ID - Groups ID Software Engineer Java 100
000 - Industry ID Software Engineer C++ 88 000 - Skills Classifier
Title Skill Occurrences Skills (ranked by likelihood) 25
ProfileSkill Inference How suggested/inferred skills work:
Extract The skill likelihood is a conditional model attributes
Feature - Company ID Probabilities are combined using a Nave Bayes
Vectors - Title ID Classifier - Groups ID - Industry ID - Skills
Classifier If you are an engineer at Apple, you probably know about
iPhone Development. Skills (ranked by likelihood) 26
Skill Suggestions for Your LinkedIn Profile 4% Conversion 49%
Conversion 29
Suggesting Endorsements Candidate People-skill combinations in
a members network generation Binary classification Feature -
Company Features Vectors - Title Skill inference score - Groups
Company overlap - Industry School overlap - Group overlap Industry
and functional area similarity Classifier Title similarity Site
interactions Co-interactions Suggested Endorsements (ranked by
likelihood) 32
Social Tagging Accelerates Adoption Skill marketingSkill
recommendations Skill endorsements 2012 LinkedIn Cororation. All
Rights Reserved.
Can We Find Influencers In Venture Capital? 34
Which Skills Are Important for a Data Scientist? 35
What Technologies are Professionals Adopting? 36
Data Amplifies Desire1. Desire + Social Proof2. Viral Loops +
Network Effects3. Data Catalyst + Recommendation Algorithms 37
Infrastructure Apache Hadoop: Parallel processing architecture
Apache Kafka: Ingress pipes Azkaban: Hadoop scheduler Voldemort:
Egress database Apache Pig: High-level MR language DataFu:
Convenience routineshttp://data.linkedin.comR. Sumbaly, J. Kreps,
and S. Shah. The Big Data ecosystem at LinkedIn. In SIGMOD 2013 (to
appear). 2012 LinkedIn Corporation. All Rights Reserved. 38