28
The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Embed Size (px)

Citation preview

Page 1: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

The SearchMaster's Toolbox

ECIR Industry Day 01 Apr 2010

David Hawking

Page 2: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking
Page 3: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking
Page 4: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

UK Customers• From 2004/5: Staffordshire University,

Scottish Care Commission

• From 2009:The Electoral Commission, Digital UK, Hargreaves Lansdown

• From 2010: London School of Economics and Political Science, Incisive Media, British Medical Journal, East Ayrshire Council, ...

Page 5: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

“Search is life”

Page 6: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Costs of poor search• Butler Group: Up to 10% of salary costs

wasted through ineffective search• IDC: A company with 1000 information

workers can expect to waste more than $5M p.a. due to poor search

• Accenture: A survey of 1000 middle managers spend as long as 2 hrs/day searching for information.

Page 7: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Who's the SearchMaster in your organisation?

Page 8: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Stakeholders expect every SearchMaster to do her duty!

• To make external website search work– Sales conversions– Information dissemination– Reduced inquiry handling load

• To provide effective search of corporate information– Happy, productive employees (plus students

and other stakeholders)

Page 9: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Give them the tools and they will do the job!

• Searchmaster• End-user

• Simple• Powerful

Page 10: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

1. The basic search tool• Should:

– Have good performance out of the box, without weeks of implementation.

– Be simple to configure– Avoid features which are too complex to use or

set up.– Be able to cover your content and scale to the

necessary level

Page 11: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

2. FineTuner• Every search deployment is different

– Web, database, fileshare, Lotus

• The weighting of ranking features must accommodate to the differences

• Manual tweaking is fraught with danger– Fix one query, break a dozen

• Make a test file and use a tuning tool to learn feature weightings

Page 12: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Testfile Desiderata• Representative of real workload

– Need an unbiased sample

• Many queries (typically >> 100)• Multiple weighted answers (where

applicable)• Redirects• Equivalent answers• See es.csiro.au/C-TEST/

Page 13: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Academic Research on Evaluation

• Masses of academic research• How does it translate to tuning an

enterprise search system?– Setting good defaults– Tuning to specific characteristics in hundreds

of customer deployments

• Note: the system starts with no user interaction data.

• Creation of testfiles must be affordable.

Page 14: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Spreadsheet testfileemployment health.gov.au/health-career-vacant.htm

jobs health.gov.au/health-career-vacant.htm

vacancies health.gov.au/health-career-vacant.htm

recruitment health.gov.au/health-career-vacant.htm

tenders health.gov.au/list-of-tenders-and-grants.htm

grants health.gov.au/list-of-tenders-and-grants.htm

tenders health.gov.au/list-of-tenders-and-grants.htm

mental health health.gov.au/mental-health-and-wellbeing

mental health strategyhealth.gov.au/mental-strategy

aged care health.gov.au/aged-care.htm

Page 15: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

LSE Case Study

Page 16: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Sources of testfiles at LSE• A-Z Sitemap (>500 entries)

– Biased toward anchortext

• Keymatches file (>500 entries)– Pessimistic

• Click data (>250 queries with > t clicks)– Biased toward clicks – 100% success!

• Pop/crit queries (134 manually judged)

All biased – Use a sampling tool!

Page 17: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

1 2

3

dim2

dim1

Dimension-at-a-time tuning

Page 18: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Out of boxAs configured

-daat (tuned)-daat20000 (tuned)

-daat0/TAAT (tuned)

0

5

10

15

20

25

30

Popular/Critical Set

Page 19: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Fine Tuning Summary• Tuning a large number of dimensions

(Funnelback FineTune covers 38)• Millions of query executions• Achieves substantial gains

Page 20: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

But why do queries still fail?

• Misspelled– Europian Conferense oninformation retreival

• Query words don't match document– “door” or “MOPEM” v. “manually operated

personnel egress mechanism”

• There is no answer to that question.– Maybe there should be– Scope issues.

Page 21: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Need more tools!

Page 22: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

3. Spelling suggestion tools• Suggestions may be useful even if words

are correctly spelled:– Carlton furball club → Carlton football club

• Suggestions based on whole query, not word-by-word

• Don't suggest queries which make no sense in the collection being searched

• Autocompletion: Guide users to the best query

• Context is king

Page 23: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

4. Query expansion tools• Manual rules:

– Rego → [registration rego]– MOPEM →[“manually operated personnel

egress mechanism door”]

• Related queries (automatic)– Based on co-clicking

• Contextual navigation (on-the-fly)– Finding superphrases in a deep result set

• Faceting (semi-automatic)

Page 24: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking
Page 25: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

5. Reporting and alerting tools• Reporting on Queries which:

– Produced no results– Logged behaviour suggestive of unfulfilment

• Alerting when:– Submissions of a query (or group of related

queries) sharply increase in frequency

• For:– business intelligence– Triggering creation or changes to content

Page 26: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Query Spike Alerting

Page 27: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking

Conclusions• Search is important• Organisations benefit when someone takes

responsibility for effective search – the SearchMaster.

• Academic research into evaluation needs careful translation for use in enterprise search tuning.

• Further tools are needed to overcome poor queries and missing content.

Thanks to Mike Swanson of Oxfam Australia for the Ned Kelly line.

Page 28: The SearchMaster's Toolbox ECIR Industry Day 01 Apr 2010 David Hawking