22
Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting 24 April 2006 Boston, MA

Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Embed Size (px)

Citation preview

Page 1: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Divide and Conquer:Challenges in Scaling Federated

Search

Presented by Abe Lederman, President and CTO

Deep Web Technologies, LLC

SearchEngine Meeting 24 April 2006 Boston, MA

Page 2: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

SEARCH ALL OF THESE SOURCES

ONE AT A TIME

Page 3: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

OR SEARCH THEM ALL AT

ONCE

Page 4: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Finding the Gold Hidden in the World Wide Web

“Google-type” search engines “pan” the surface web for gold

“Deep Web” search engines go mining for gold

Page 5: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Finding the Gold Hidden in the World Wide Web

“Google-type” search engines “pan” the surface web for gold

“Deep Web” search engines go mining for gold

Page 6: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Challenges Overview

• Managing a large number of sources

• Searching a large number of sources in parallel

• Organizing and ranking the results returned

Page 7: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Challenges of Managing Thousands of Data Sources

Locate Reliable Sources

Categorize Sources by Content

Configure Sources for Searching

Maintain Sources

4

Page 8: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Challenges in Searching Thousands of Sources

Automatically Select Sources to Search

Retrieve Results from Cache

5

Perform Many Searches in Parallel

Bring Back Best Results

Page 9: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Source Selection Optimizer

Search Conductor

Source Selection Optimizer

Source

Descriptions Previous Results

Page 10: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Caching of Search ResultsReduces the load (cost) of accessing sources

CHALLENGES

• Requires a large database

• Need to determine how often to update the cache

• Works best with lots of users doing similar searches

Page 11: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

We Address Scalability Through a Grid-Based Solution

• Uses open standards (Web Services, WSDL, SOAP, XML)

• Runs on distributed nodes

• Is platform independent (Java based)

• Very flexible, providing a framework for integration of various filtering and analysis tools

Page 12: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Distributing the Workload as Grid Services

Information Services

Filtering Services

Aggregation Services

Presentation Services

A0

A0

A1

IS0

IS2

IS1

IS3

P0

F0

F0

F0

F0

Page 13: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Select sources to search

Can I get more results from “good”

sources?

Enough good

results?

YES

Deliver results to user

YES

NO

NO

Perform Search

Get Next Results

Search Conductor

Page 14: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Searching a large number

of sources can lead to a flood

of results

Page 15: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Challenges in Organizing and Ranking Results

5

Multi-tier Relevance Ranking

User-driven Ranking

Clustering of Results

Page 16: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Multi-tier Relevance Ranking

• QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet

• MetaRank – Ranks results utilizing custom algorithms applied to meta-data

• DeepRank – Downloads and indexes full-text documents

HEAVY LIFTING REQUIRED!

Page 17: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

User-driven Ranking

Credibility of sourceDate rangeDocument lengthDocument type

Geographic proximityPopularity of documentReading levelRelevance

Desired: Blending (weighing) of above criteria

Page 18: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Clustering

Page 19: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

A Grand Challenge for Federated Search

Source: Walter Warnick, Ph.D., DOE OSTI. Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science. Presented at the Annual Meeting of the American

Association for the Advancement of Science, February 16-20, 2006.

Page 20: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Mathematician’s Scientific Discovery

Biology Researcher’s

Scientific Discovery

Physics Scientific Discovery

Math Databases:•Research Papers•Correspondence•Conferences

Biology Databases:•Research Papers•Correspondence•Conferences

Physics Databases:•Research Papers•Correspondence•Conferences

Global Discovery

Search Portal

Math Community

Biology Community

Physics Community

Knowledge Diffusion in Action

Page 21: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Grid of Grids

Each circle = a portal with 10-100 sources

End result is thousands of sources in 2

hops

Scaling to the Next Level

Page 22: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting

Abe Lederman

122 Longview Drive

Los Alamos, NM 87544

[email protected]

www.deepwebtech.com

12

Thank You!