12
Search and Data Management Rakesh Agrawal MSR Search Lab

Search and Data Management Rakesh Agrawal MSR Search Lab

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Search and Data Management Rakesh Agrawal MSR Search Lab

Search and Data Management

Rakesh AgrawalMSR Search Lab

Page 2: Search and Data Management Rakesh Agrawal MSR Search Lab

Current Focus & Direction

• Understand the virtuous cycle between search and data and ways to accelerate it

• New search-centric applications– Personal data mining (Health)– Distributed Knowledge creation (Education)

Page 3: Search and Data Management Rakesh Agrawal MSR Search Lab

Search & Data: Virtuous Cycle

Search

DataInsights

Queries, Clicks

Mining

Relevance

Web PagesFeedsBetter Search Results ►

More Data ►Greater Insights ►

Better Search Results

Intents

Behaviors

Connections

Popularity

Trends

Page 4: Search and Data Management Rakesh Agrawal MSR Search Lab

Related Searches (aka Query Suggestions)

• Most popular queries containing the current query• Analysis of how users reformulated their queries

• Query click graph to find related queries

Football SoccerWildflower cafe Wildflower bakery

(whole query)(piecewise)

Page 5: Search and Data Management Rakesh Agrawal MSR Search Lab

Result Diversification

• Ideas from portfolio theory to allocate space to different result types

• Marginal utility of adding a document decreases if the result set already contains high quality documents of the same type

• Query and document classification using merged click logs

Page 6: Search and Data Management Rakesh Agrawal MSR Search Lab

Seeddocuments

ANIMALS documents

ANIMALS queries

Classification Using Click Graph

Algorithm: Random walk with absorbing states

Page 7: Search and Data Management Rakesh Agrawal MSR Search Lab

118

125

133

141

149

157

164

171

100

120

140

160

180

1995 2000 2005 2010 2015 2020 2025 2030

Year

Num

ber

of P

eopl

e W

ith

Chr

onic

Con

ditio

ns (m

illio

ns)

Changing Nature of Disease

• New Challenge: chronic conditions: illnesses and impairments expected to last a year or more, limit what one can do and may require ongoing care.

• In 2005, 133 million Americans lived with a chronic condition (up from 118 million in 1995).

Infectious Diseases

Page 8: Search and Data Management Rakesh Agrawal MSR Search Lab

Technology Trends

• Tremendous simplification in the technologies for capturing useful personal information

• Dramatic reduction in the cost and form factor for personal storage

• Cloud Computing

Page 9: Search and Data Management Rakesh Agrawal MSR Search Lab

Personal Health Analytics

Page 10: Search and Data Management Rakesh Agrawal MSR Search Lab

Personal Data Mining

Charts for appropriate demographics?

Optimum level for Asian Indians: 150 mg/dL(much lower than 200 mg/dL for Westerners)

Due to elevated levels of lipoprotein(a)*

Computation and selection across millions of data sources

Privacy and security

*Enas et al. Coronary Artery Disease In Asian Indians. Internet J. Cardiology. 2001.

Page 11: Search and Data Management Rakesh Agrawal MSR Search Lab

Collaborative Knowledge Creation(Educational Material)

• More than 3.5 million articles in 75 languages

• Fashioned by more than 25,000 writers

• 1 million articles in English (80,000 in Encyclopedia Britannica)

• Inspired by Wikipedia• But multiple viewpoints

rather than one consensus version!

• How to personalize search to find the material suitable for one’s own style of teaching?

• Management of trust and authoritativeness?

Page 12: Search and Data Management Rakesh Agrawal MSR Search Lab

Summary

• Web search is a “data management and creating value from data” problem

• New search-centric applications can provide rich fodder for future database research.