18
LinkedIn Skills: Large-Scale Topic Extraction and Inference Mathieu Bastian LinkedIn Corporation ©2014 All Rights Reserved

LinkedIn Skills: RecSys Conference 2014

Embed Size (px)

Citation preview

LinkedIn Skills: Large-Scale Topic Extraction

and Inference

Mathieu Bastian

LinkedIn Corporation ©2014 All Rights Reserved

The World’s Largest Professional Network

Members Worldwide

2 newMembers Per Second

100M+Monthly Unique Visitors

313M+ 3M+Company Pages

Connecting Talent Opportunity. At scale…

LinkedIn Profile

313M+ profiles in 200+ countries

Organized into sections

– Standardized: Companies, Titles, Industry,

Location etc.

– Unstandardized: Text (Summary, Position

description, specialties)

Skills & Endorsements section

– Introduced in 2011

– Limited to 50 skills per profile

Skills at LinkedIn

Key component of the

professional identity

Dictionary of 45k+ skills in

English

Members have diverse skills

– Java Programming

– Ballet

– Politics

– Bow Hunting

Many of these are long-tailExample of a Skills section on a LinkedIn profile

Folksonomy creation

LinkedIn Corporation ©2014 All Rights Reserved

Folksonomy creation

Create a folksonomy of skills based on LinkedIn profiles

Leverage the “specialties” section

Detect comma-separated lists and extract skill phrases

Use stop-list and exclude other entities (e.g. companies, titles,

degrees)

150k skill phrases extracted after removing long-tail noise

skill

phrases

Disambiguation

Need to add context to differentiate skill phrases with multiple

meanings (e.g. NLP = Natural Language Processing,

NLP = Neuro-linguistic programming)

Different meanings have different sets of related phrases

Use Jaccard Similarity on LinkedIn profiles for related phrases and

then SVD + KMeans to identify clusers of phrases

References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463

De-duplication

Need to group phrases with similar meaning together. Examples:

– Acronyms: B2B, Business to Business

– Synonyms: Java Programming, Java Development

– Typos: Government Liason

Many of the skill phrases could be tied to a Wikipedia page

Built Mechanical Turk (www.mturk.com) task to find the Wikipedia

page associated with a skill phrase

Java programming

Java development

Java

http://en.wikipedia.org/wiki/Java

_(programming_language)

Cluster

Extraction based on 12M of LinkedIn profiles with “specialties”

Extracted 150k skill phrases

Clustered related phrases adding the industry context to ambiguous

phrases

De-duplication using MTurk

Final master list contains 50k skills

Folksonomy creation summary

Examples of synonyms of

“Microsoft Office”

Inference and Recommendation

LinkedIn Corporation ©2014 All Rights Reserved

Goal was boosting skills adoption with a recommender system:

“suggested skills”

Inferring the skills members have, similar to discovering latent

attributes in profiles

Develop a collaborative filtering solution using profile attributes

Skills Inference and Recommendation

References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks.

R. Jäschke and al. Tag recommendations in folksonomies.

Skills Typeahead on LinkedIn

Suggested Skills

Large number of standardized profile attributes (i.e. can be

represented by a unique identifier)

Members with similar profiles attributes are likely to have similar

skills (e.g. If you work at Apple, you probably know “Mac OS”)

Features

Type Example Cardinality

Title (Headline) Product Manager Thousands

Function Engineering Dozens

Industry Healthcare Dozens

Title (Employment Position) Product Manager Thousands

Company LinkedIn Millions

Group membership Healthcare Professionals Millions

Skills Matlab Thousands

Calculate the likelihood that a member has a given

skill, given his profile attributes

No direct user similarity metric

Large number of features (e.g. 3M companies) and 50k classes

Problem

the set of profile attributes

the folksonomy of skills

Used a Naïve Bayes Classifier to produce inferred skills

Training data based on members already with skills

Result is a ranking of inferred skills, which can directly be used in

“suggested skills”

Evaluation methodology

– AUC for each skill

– P@k and Recall for evaluating the recommendations

Naïve Bayes Classifier

with

Evaluate how well we can predict skills members’ have

Evaluation

ROC of skill “Hadoop” Distribution of ROC across

all skills

12X improvement in conversion using “suggested skills”

Results

Without

“suggested skills”

With

“suggested skills”

Our Contributions

End-to-end creation of a skills folksonomy based on free-text

specialties section

Efficient inferred skills model with good offline performance

Skills recommender system based on profile attributes

Thank You