Upload
mathieu-bastian
View
187
Download
3
Tags:
Embed Size (px)
Citation preview
LinkedIn Skills: Large-Scale Topic Extraction
and Inference
Mathieu Bastian
LinkedIn Corporation ©2014 All Rights Reserved
The World’s Largest Professional Network
Members Worldwide
2 newMembers Per Second
100M+Monthly Unique Visitors
313M+ 3M+Company Pages
Connecting Talent Opportunity. At scale…
LinkedIn Profile
313M+ profiles in 200+ countries
Organized into sections
– Standardized: Companies, Titles, Industry,
Location etc.
– Unstandardized: Text (Summary, Position
description, specialties)
Skills & Endorsements section
– Introduced in 2011
– Limited to 50 skills per profile
Skills at LinkedIn
Key component of the
professional identity
Dictionary of 45k+ skills in
English
Members have diverse skills
– Java Programming
– Ballet
– Politics
– Bow Hunting
Many of these are long-tailExample of a Skills section on a LinkedIn profile
Folksonomy creation
Create a folksonomy of skills based on LinkedIn profiles
Leverage the “specialties” section
Detect comma-separated lists and extract skill phrases
Use stop-list and exclude other entities (e.g. companies, titles,
degrees)
150k skill phrases extracted after removing long-tail noise
skill
phrases
Disambiguation
Need to add context to differentiate skill phrases with multiple
meanings (e.g. NLP = Natural Language Processing,
NLP = Neuro-linguistic programming)
Different meanings have different sets of related phrases
Use Jaccard Similarity on LinkedIn profiles for related phrases and
then SVD + KMeans to identify clusers of phrases
References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463
De-duplication
Need to group phrases with similar meaning together. Examples:
– Acronyms: B2B, Business to Business
– Synonyms: Java Programming, Java Development
– Typos: Government Liason
Many of the skill phrases could be tied to a Wikipedia page
Built Mechanical Turk (www.mturk.com) task to find the Wikipedia
page associated with a skill phrase
Java programming
Java development
Java
http://en.wikipedia.org/wiki/Java
_(programming_language)
Cluster
Extraction based on 12M of LinkedIn profiles with “specialties”
Extracted 150k skill phrases
Clustered related phrases adding the industry context to ambiguous
phrases
De-duplication using MTurk
Final master list contains 50k skills
Folksonomy creation summary
Examples of synonyms of
“Microsoft Office”
Goal was boosting skills adoption with a recommender system:
“suggested skills”
Inferring the skills members have, similar to discovering latent
attributes in profiles
Develop a collaborative filtering solution using profile attributes
Skills Inference and Recommendation
References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks.
R. Jäschke and al. Tag recommendations in folksonomies.
Skills Typeahead on LinkedIn
Suggested Skills
Large number of standardized profile attributes (i.e. can be
represented by a unique identifier)
Members with similar profiles attributes are likely to have similar
skills (e.g. If you work at Apple, you probably know “Mac OS”)
Features
Type Example Cardinality
Title (Headline) Product Manager Thousands
Function Engineering Dozens
Industry Healthcare Dozens
Title (Employment Position) Product Manager Thousands
Company LinkedIn Millions
Group membership Healthcare Professionals Millions
Skills Matlab Thousands
Calculate the likelihood that a member has a given
skill, given his profile attributes
No direct user similarity metric
Large number of features (e.g. 3M companies) and 50k classes
Problem
the set of profile attributes
the folksonomy of skills
Used a Naïve Bayes Classifier to produce inferred skills
Training data based on members already with skills
Result is a ranking of inferred skills, which can directly be used in
“suggested skills”
Evaluation methodology
– AUC for each skill
– P@k and Recall for evaluating the recommendations
Naïve Bayes Classifier
with
Evaluate how well we can predict skills members’ have
Evaluation
ROC of skill “Hadoop” Distribution of ROC across
all skills
12X improvement in conversion using “suggested skills”
Results
Without
“suggested skills”
With
“suggested skills”
Our Contributions
End-to-end creation of a skills folksonomy based on free-text
specialties section
Efficient inferred skills model with good offline performance
Skills recommender system based on profile attributes