Site-wide Search

Preview:

DESCRIPTION

Upgrade and new features Jon Warbrick University of Cambridge Computing Service jw35@cam.ac.uk. Site-wide Search. Site-wide search. web-search.cam.ac.uk. Site-wide search. web-search.cam.ac.uk Ultraseek, from Infoseek. Site-wide search. web-search.cam.ac.uk - PowerPoint PPT Presentation

Citation preview

Site-wide Search

Upgrade and new features

Jon Warbrick

University of Cambridge Computing Service

jw35@cam.ac.uk

Site-wide search

● web-search.cam.ac.uk

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

-> Autonomy

Site-wide search

● web-search.cam.ac.uk

● Ultraseek, from Infoseek -> Inktomi -> Verity

-> Autonomy

● Currently indexing

– ~600 servers

– ~1.2 million documents

– ~2.5 million URLs

Site-wide search

● Indexes 'more-or-less official' servers

Site-wide search

● Indexes 'more-or-less official' servers

● Maintains two indexes

– 'internal' and 'external'

– automatically routes queries

Site-wide search

● Indexes 'more-or-less official' servers

● Maintains two indexes

– 'internal' and 'external'

– automatically routes queries

● Services for University Webmasters

– Add/delete/re-index

– Packaged searches

2006 Upgrade

● Improved resilience

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

● Grouping by location

2006 Upgrade

● Improved resilience

● Case-insensitive matching

● Quick Links

● Passage-based summaries

● Grouping by location

2006 Upgrade

● Improved resilience

● Case-inSenSITIVE matching

● Quick Links

● Passage-based summaries

● Grouping by location

● [ All terms matching ]

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

● Sources of indexing requests

– s1.web-search.cam.ac.uk -

s6.web-search.cam.ac.uk

– an address in the range 192.153.213.0-255

2006 Upgrade

● More indexing (dynamic pages + https +

JavaScript)

● Sources of indexing requests

– s1.web-search.cam.ac.uk -

s6.web-search.cam.ac.uk

– an address in the range 192.153.213.0-255

● Backup search engines

– Add URL, Revisit Site, etc.

Problems with dynamic content

Problems with dynamic content

● Randomly permuted

query arguments

● Gratuitously-varying

detail

● Variant pages

● Calendars linking to other

pages

● Cache-busting headers

● Frames hiding real URL

● Junk path info

● 'Success' error pages

● Lack of Last Modification

time stamp

● Inconsistent URLs

Further information

● Notes for webmasters:

http://www.cam.ac.uk/cs/web-search/

● Details of recent changes:

http://www.cam.ac.uk/cs/web-search/changes-200608.html

● Help and advice:

web-support@ucs.cam.ac.uk

If you have been, thanks for listening

I wonder if anyone will ask...

I wonder if anyone will ask...

“Why don't you use Google?”