Better Search Through Query Understanding

Preview:

DESCRIPTION

Better Search Through Query Understanding Presented as a Data Talk at Intuit on April 22, 2014 Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding. About the Speaker Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Citation preview

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Daniel TunkelangHead, Query Understanding

better search throughquery understanding

Daniel

overview

query understanding: what is it? how we do query understanding at LinkedIn some other thoughts from search in the wild

what I’m not going to cover:

2

Information need query select from results

rank using IR model

user:

system:tf-idf PageRank

bird’s-eye view of how a search engine works

3

Information need query select from results

rank using IR model

user:

system:tf-idf PageRank

query understanding

4

search is a communication problem

5

6

tag: skill OR titlerelated skills: search, ranking, …

tag: companyid: 1337industry: internet

verticals:people, jobs

intent: exploratory

7

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

8

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

9

fix obvious typos

help users spell names

spelling correction

10

spelling out the details

PEOPLE NAMESCOMPANIES

TITLES

PAST QUERIES

n-gramsmarissa => ma ar ri is ss sa

metaphonemark/marc => MRK

co-occurrence countsmarissa:mayer = 1000

marisa meyer yahoo

marissa

marisa

meyer

mayer

yahoo

11

spelling out the details

problem: corpus as well as query logs contain many spelling errors

certain spelling errors are quite frequent

while genuine words (especially names) might be infrequent

12

spelling out the details

problem: corpus & query logs contain spelling errors

solution: use query chains to infer correct spelling

[product manger] [product manager] CLICK

[marissa mayer] CLICK

13

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

14

query tagging: identifying entities in the query

TITLE CO GEO

TITLE-237software engineersoftware developer

programmer…

CO-1441Google Inc.

Industry: Internet

GEO-7583Country: US

Lat: 42.3482 NLong: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

15

query tagging: identifying entities in the query

TITLE CO GEO

MORE PRECISE MATCHING WITH DOCUMENTS

16

entity-based filtering

BEFORE

17

entity-based filtering

AFTER

BEFORE

18

entity-based filtering

BEFORE

19

entity-based filtering

AFTER

BEFORE

20

entity-based suggestions

21

entity-based suggestions

22

query tagging: sequential model

EMISSION PROBABILITIES

(learned from user profiles)

TRANSITION PROBABILITIES

(learned from query logs)

TRAINING

23

query tagging: sequential model

INFERENCE

given a query, find the most likely sequence of tags

24

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

25

vertical intent prediction: distribution

JOBS

PEOPLE

COMPANIES

(probability distribution over verticals)

26

vertical intent prediction: relevance

[company]

[employees]

[jobs]

[name search]

27

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

28

query expansion: name synonyms

29

query expansion: job title synonyms

30

query expansion: signals

[jon] [jonathan] CLICK

trained using query chains:

[programmer] [developer] CLICK

symmetric but not transitive!

[francis] ⇔ [frank]

[franklin] ⇔ [frank]

[francis] ≠ [franklin]

[software engineer] [software developer] CLICK

context based!

[software engineer] => [software developer]

[civil engineer] ≠ [civil developer]

31

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

32

what else can we learn from search in the wild?

33

don’t guess when it’s better to ask

vs.

34

clarify then refine

computers books

35

give users transparency, guidance, and control

36

think beyond individual search queries

Gene Golovchinsky, FXPAL

37

know when you don’t know

Claudia Hauff, Query Difficulty for Digital Libraries [2009]

38

Daniel Tunkelangdtunkelang@linkedin.comhttps://linkedin.com/in/dtunkelang

Recommended