Upload
indeedeng
View
393
Download
0
Embed Size (px)
Citation preview
I
Job recommendations are significantly different
Rapid inventory growth - Millions of new jobs discovered every day
Job recommendations are significantly different
Rapid inventory growth - Millions of new jobs discovered every day
~ 1.5 million new users visit indeed every day
Job recommendations are significantly different
Rapid inventory growth - Millions of new jobs discovered every day
~ 1.5 million new users visit indeed every day
Average lifespan of a job is ~30 days
Job recommendations are significantly different
Rapid inventory growth - Millions of new jobs discovered every day
~ 1.5 million new users visit indeed every day
Average lifespan of a job is ~30 days
One job posting usually meant to hire one individual
Compute similarity
For ui In {Users}
For uj In {Users}
SIMi,j
= compute_similarity(ui,
uj)
→
→
→
∩ ∪
Items[Ui] = {x
1, x
2, ..x
n}
H
minhashH(U
i)= min{ x∈Items
i| H(x) }
Similarity(U1, U2) = 1, if minhash(U1) == minhash(U2)
Similarity(U1, U2) = 0, otherwise
This is an unbiased estimator
Similarity(U1, U2) = 1, if minhash(U1) == minhash(U2)
Similarity(U1, U2) = 0,
Hk
Hk
Prob(minhashH(U
i) == minhash
H(U
j)) = J(U
i, U
j)
user → {job1, job2, job3,..}
H = {H1, H
2, ..H
20}
for user in Users
for hash in H
minhash[hash] = min{x∈Itemsi| hash(x)}
For ui In {Users}
For uj In {Users}
SIMi,j
= compute_similarity(ui,
uj)
user1 → (111, 123, 134, 148, ..129)
user2 → (101, 123, 139, 148, ..135)
user3 → (191, 103, 126, 108, ..119)
user4 → (191, 103, 126, 108, ..129)
...
user → {cluster}
cluster → {users}
123 → (user1, user2)
148 → (user1, user2)
129 → (user1, user4)
191 → (user3, user4)
...
→
→
user1 → {job1, job2}
user2 → {job2, job3, job5}
123 → {user1, user2}
→
user1 → {job1, job2}
user2 → {job2, job3, job5}
123 → {job1, job2, job3, job5}
1. user → {cluster}
user → {cluster} user1 → {111, 123, ..}
111 → {job5, job2, job9}
123 → {job1, job2, job3, job5}
{job2, job5, job9, job1, job3}
→→
{job2, job5, job9, job1, job3}
1.
→
→ {101, 121}
→
→ {101, 121}
{“Software Engineer”,
“Java Developer”, “Python Developer”}
→
→ {101, 121}
{“Software Engineer”,
“Java Developer”, “Python Developer”}
minhash({“Software Engineer”, “Java Developer”,
“Python Developer”}) → {99, 135}
→
→ {101, 121}
{“Software Engineer”,
“Java Developer”, “Python Developer”}
minhash({“Software Engineer”, “Java Developer”,
“Python Developer”}) → {99, 135}
→ {99, 121}
→
minhash({“Software Engineer”, “Java Developer”,
“Python Developer”}) → {99,121}
99 → add {“Software Engineer”, “Java Developer”,“Python Developer”}
121 → add {“Software Engineer”, “Java Developer”,“Python Developer”}
{“Software Engineer”, “Java Developer”,
“Python Developer”} {99, 121}
99 → {“Software Engineer”, “Java Developer”, “Python Developer”}
→ {99, 131}
{“Software Engineer”, “Java Developer”,
“Python Developer”}
→
→
→
→
→
→
●●
1. http://go.indeed.com/docservice
→
→
→
→
→
●●●
Engineering blog & talks http://indeed.tech
Open Source http://opensource.indeedeng.io
Careers http://indeed.jobs
Twitter @IndeedEng