35
1 Recommendations @ LinkedIn

Recommendations@LinkedIn

Embed Size (px)

DESCRIPTION

I gave this talk earlier this week at Hadoop World(http://www.hadoopworld.com/sessi...), a conference that is evangelizing Hadoop by way of highlighting how people across the industry are solving big business challenges by leveraging Hadoop. I am posting here the slides with an approximate transcript of my talk.

Citation preview

Page 1: Recommendations@LinkedIn

1

Recommendations @ LinkedIn

Page 2: Recommendations@LinkedIn

2

Think PlatformLeverage Hadoop

Page 3: Recommendations@LinkedIn

The world’s largest professional network Over 50% of members are now international

3

*as of Nov 4, 2011**as of June 30, 2011

2004 2005 2006 2007 2008 2009 2010

2 48

17

32

55

90

LinkedIn Members (Millions)

135M+*

75%Fortune 100 Companies use LinkedIn to hire

Company Pages

>2M**

**

New Members joining

~2/sec

Page 4: Recommendations@LinkedIn

4

Recommendations Opportunity

Page 5: Recommendations@LinkedIn

5

Page 6: Recommendations@LinkedIn

6

Page 7: Recommendations@LinkedIn

7

Page 8: Recommendations@LinkedIn

8

Page 9: Recommendations@LinkedIn

9

Page 10: Recommendations@LinkedIn

10

Page 11: Recommendations@LinkedIn

The Recommendations Opportunity

11

Pandora Search for People

Events YouMay BeInterested In

Groups browse maps

Page 12: Recommendations@LinkedIn

12

50%

Page 13: Recommendations@LinkedIn

13

PositionsEducation

Summary

Experience

Skills

Page 14: Recommendations@LinkedIn

Are all titles the same?

- Software Engineer- Technical Yahoo- Member Technical Staff- Software Development Engineer- SDE

Page 15: Recommendations@LinkedIn

Are all companies the same?

‘IBM’ has 8000+ variations- ibm – ireland- ibm research- T J Watson Labs- International Bus. Machines

Page 16: Recommendations@LinkedIn

Recommendation Trade-offsThe need for a common platform

16

Real Time

Time Independent

Page 17: Recommendations@LinkedIn

Recommendation Trade-offsThe need for a common platform

17

Content Analysis

Collaborative

Page 18: Recommendations@LinkedIn

Recommendation Trade-offsThe need for a common platform

18

Recall

Precision

Page 19: Recommendations@LinkedIn

Related TitlesRelated CompaniesRelated Industries

Related TitlesRelated CompaniesRelated Industries

TitleSpecialtyEducationExperienceLocationIndustry

SenioritySkills

TitleSpecialtyEducationExperienceLocationIndustry

SenioritySkills Specialty -> Specialty

Seniority -> Seniority

Skills -> Skills

Title -> Title

Summary -> Summary

Title -> Related Title

Education -> Education

.

.

.

BinaryExact match

Exact match in bucket

Soft Match v1 = tf * idf

CosΘ = v1*v2

|v1|*|v2|

Matching 0.58

0.94

0.26

0.18

0.98

0.16

0.40

Page 20: Recommendations@LinkedIn

Importance

weight vector

(Skills-> Skills)

Similarity

score vector

(Skills-> Skills)

Normalization, Scoring

& RankingFiltering

LocationCompanyIndustry

Fee

db

ack

0.94

0.70

Page 21: Recommendations@LinkedIn

Technologies

Page 22: Recommendations@LinkedIn

22

Hadoop Case Studies

• Scaling • Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting

Page 23: Recommendations@LinkedIn

2323

ScalingBillions of Recommendations

Latency > 1 sec

Latency < 1 sec

Recall = Low

Latency < 1 sec

Recall = High

Minhashing

Page 24: Recommendations@LinkedIn

24

Hadoop Case Studies

• Scaling ✔• Blending Recommendation Algorithms• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting

Page 25: Recommendations@LinkedIn

25

Blending Recommendation Algorithms

Co-View Impact Latency ~ Minutes

Complexity = High

Co-View Impact Latency ~ Hours

Complexity = Low

Page 26: Recommendations@LinkedIn

26

Hadoop Case Studies

• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering• Model Selection• A/B Testing• Tracking and Reporting

Page 27: Recommendations@LinkedIn

27

GrandfatheringAdding and Changing Features

No Time Guarantees

Minimal Disruption

Next Profile Edit

Time ~ Week

Significant Systems Work

Parallel Feature

Extraction Pipeline

Time ~ Hour

Minimal Disruption

Grandfather When Ready

Page 28: Recommendations@LinkedIn

28

Hadoop Case Studies

• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection• A/B Testing• Tracking and Reporting

Page 29: Recommendations@LinkedIn

292929

Model Selection

`

• Features • Models• Parameters

SVM

Logistic

RegressionContent,Collaborative

SVMDecision Trees

L1+L2

Regularization

Page 30: Recommendations@LinkedIn

30

Hadoop Case Studies

• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing• Tracking and Reporting

Page 31: Recommendations@LinkedIn

313131

A/B TestingIs Option A Better Than Option B? Let’s Test

`

10%

90%

New

Model

Old

Model

A

B

Traffic

Send 10% of members who have more than 100 connections AND

who have logged in the past one week, AND who are based in Europe

Page 32: Recommendations@LinkedIn

32

Hadoop Case Studies

• Scaling ✔• Blending Recommendation Algorithms ✔• Grandfathering ✔• Model Selection ✔• A/B Testing ✔• Tracking and Reporting

Page 33: Recommendations@LinkedIn

33

Tracking and ReportingK-way joins across billions of rows

Up to the minute reportingNearsightedness

K-way join complexity

Lacks up to the minute reporting

Simple k-way joins

Page 34: Recommendations@LinkedIn

34

Think PlatformLeverage Hadoop

Page 35: Recommendations@LinkedIn

3535

Come work with us at LinkedIn

LinkedIn

Applied Research

Engineer

You