26
1 Predictive Parallelization: Taming Tail Latencies in Web Search Myeongjae Jeon, Saehoon Kim, Seung-won Hwang , Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University

Predictive Parallelization: Taming Tail Latencies in Web Search

  • Upload
    lizina

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Predictive Parallelization: Taming Tail Latencies in Web Search. Myeongjae Jeon , Saehoon Kim, Seung -won Hwang , Yuxiong He, Sameh Elnikety , Alan L. Cox, Scott Rixner Microsoft Research , POSTECH , Rice University. Performance of Web Search. 1) Query response time - PowerPoint PPT Presentation

Citation preview

Page 1: Predictive Parallelization: Taming Tail Latencies  in Web Search

1

Predictive Parallelization:Taming Tail Latencies in

Web Search

Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety,

Alan L. Cox, Scott RixnerMicrosoft Research, POSTECH, Rice University

Page 2: Predictive Parallelization: Taming Tail Latencies  in Web Search

2

Performance of Web Search

1) Query response time– Answer quickly to users (e.g., in 300 ms)

2) Response quality (relevance)– Provide highly relevant web pages– Improve with resources and time consumed

Focus: Improving response timewithout compromising quality

Page 3: Predictive Parallelization: Taming Tail Latencies  in Web Search

3

Background: Query Processing Stages

doc

2nd phase ranking

Snippet generator

Doc. index search

Response

For example:300 ms

latency SLA

QueryFocus: Stage 1

100s – 1000s of good matching docs

10s of the best matching docs

Few sentences for each doc

Page 4: Predictive Parallelization: Taming Tail Latencies  in Web Search

4

Goal

Speeding up index search (stage 1) without compromising result quality– Improve user experience– Larger index serving– Sophisticated 2nd phase

doc

2nd phase ranking

Snippet generator

Doc. index search

Response

Query

For example:300 ms

latency SLA

Page 5: Predictive Parallelization: Taming Tail Latencies  in Web Search

5

All web pages

How Index Search Works• Partition all web pages across

index servers (massively parallel)

• Distribute query processing (embarrassingly parallel)

• Aggregate top-k relevant pages

Partition Partition Partition Partition Partition Partition

Indexserver

Indexserver

Indexserver

Indexserver

Indexserver

Indexserver

Aggregator

Top-k pages

Top-k pages

Top-k pages

Top-k pages

Top-k pages

Top-kpages

PagesQuery

Problem:A slow server makes the entire cluster slow

Page 6: Predictive Parallelization: Taming Tail Latencies  in Web Search

6

Observation

• Query processing on every server. Response time is determined by the slowest one.

• We need to reduce its tail latencies

Latency

Page 7: Predictive Parallelization: Taming Tail Latencies  in Web Search

Aggregator

Indexservers

Aggregator

Indexservers

Fast response Slow response

7

Examples

• Terminate long query in the middle of processing→ Fast response, but quality drop

Long query(outlier)

Page 8: Predictive Parallelization: Taming Tail Latencies  in Web Search

8

Parallelism for Tail Reduction

Opportunity• Available idle cores• CPU-intensive workloads

Challenge• Tails are few• Tails are very long

Breakdown LatencyNetwork 4.26 ms

Queueing 0.15 ms

I/O 4.70 ms

CPU 194.95 ms

Latency breakdown for the 99%tile.

Percentile Latency Scale50%tile 7.83 ms x1

75%tile 12.51 ms x1.6

95%tile 57.15 ms x7.3

99%tile 204.06 ms x26.1

Latency distribution

Page 9: Predictive Parallelization: Taming Tail Latencies  in Web Search

10

Predictive Parallelism for Tail Reduction

• Short queries– Many– Almost no speedup

• Long queries– Few– Good speedup

1 2 3 4 5 60

2

4

6

8

10

0123456

5.2 4.5

< 30 ms

Parallelism Degree

Exec

. Tim

e (m

s)

Spee

dup

1 2 3 4 5 60

50

100

150

200

0123456

169

41

> 80 ms

Parallelism Degree

Exec

. Tim

e (m

s)

Spee

dup

Page 10: Predictive Parallelization: Taming Tail Latencies  in Web Search

11

Predictive Parallelization Workflow

query Execution time

predictor

Predict (sequential) execution time of the query with high accuracy

Index server

Page 11: Predictive Parallelization: Taming Tail Latencies  in Web Search

12

Predictive Parallelization Workflow

query Execution time

predictor

Resourcemanager

Index server

Using predicted time, selectively parallelize long queries

short

long

Page 12: Predictive Parallelization: Taming Tail Latencies  in Web Search

13

Predictive Parallelization

• Focus of Today’s Talk1. Predictor: of long query through machine learning2. Parallelization: of long query with high efficiency

Page 13: Predictive Parallelization: Taming Tail Latencies  in Web Search

14

Brief Overview of Predictor

Accuracy CostHigh recall for

guaranteeing 99%tile reduction

Low prediction overhead and misprediction cost

In our workload, 4% queries with

> 80 ms

At least 3% must be identified (75% recall)

Existing approaches:Lower accuracy and higher cost

Prediction overhead of 0.75ms or less and high precision

Page 14: Predictive Parallelization: Taming Tail Latencies  in Web Search

15

Accuracy: Predicting Early Termination

• Only some limited portion contributes to top-k relevant results

• Such portion depends on keyword (or score distribution more exactly)

Inverted index for “SIGIR”

Processing Not evaluated

Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N

Docs sorted by static rankHighest LowestWeb

documents

……. …….

Page 15: Predictive Parallelization: Taming Tail Latencies  in Web Search

• Term Features [Macdonald et al., SIGIR 12]

– IDF, NumPostings– Score (Arithmetic, Geometric, Harmonic means, max,

var, gradient)• Query features– NumTerms (before and after rewriting)– Relaxed– Language

Space of Features

Page 16: Predictive Parallelization: Taming Tail Latencies  in Web Search

New Features: Query

• Rich clues from queries in modern search engines

<Fields related to query execution plan>rank=BM25Fenablefresh=1 partialmatch=1language=en location=us ….

<Fields related to search keywords>SIGIR (Queensland or QLD)

Page 17: Predictive Parallelization: Taming Tail Latencies  in Web Search

• Term Features [Macdonald et al., SIGIR 12]

– IDF, NumPostings– Score (Arithmetic, Geometric, Harmonic means, max,

var, gradient)• Query features– NumTerms (before and after rewriting)– Relaxed– Language

Space of Features

Page 18: Predictive Parallelization: Taming Tail Latencies  in Web Search

Space of FeaturesCategory FeatureTerm feature(14)

AMeanScoreGMeanScoreHMeanScoreMaxScoreEMaxScoreVarScoreNumPostingsGAvgMaximaMaxNumPostingsIn5%MaxNumThresProKIDF

Query feature(6)

EnglishNumAugTermComplexityRelaxCountNumBeforeNumAfter

• All features cached to ensure responsiveness (avoiding disk access)

• Term features require 4.47GB memory footprint (for 100M terms)

Page 19: Predictive Parallelization: Taming Tail Latencies  in Web Search

20

Feature Analysis and Selection

• Accuracy gain from boosted regression tree, suggesting cheaper subset

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 150.600000000000001

0.650000000000001

0.700000000000001

0.750000000000001

0.800000000000001

0.850000000000001

All featuresSorted features

# features (sorted by importance)

Reca

ll

Page 20: Predictive Parallelization: Taming Tail Latencies  in Web Search

22

Prediction Performance

• Query features are important• Using cheap features is advantageous– IDF from keyword features + query features– Much smaller overhead (90+% less)– Similarly high accuracy as using all features

80 ms Thresh. Precision(|A∩P|/|P|)

Recall(|A∩P|/|A|) Cost

Keyword features 0.76 0.64 HighAll features 0.89 0.84 High

Cheap features 0.86 0.80 Low

A = actual long queriesP = predicted long queries

Page 21: Predictive Parallelization: Taming Tail Latencies  in Web Search

• Classification vs. Regression– Comparable accuracy– Flexibility– Algorithms

• Linear regression• Gaussian process regression• Boosted regression tree

Algorithms

Page 22: Predictive Parallelization: Taming Tail Latencies  in Web Search

Accuracy of Algorithms

• Summary– 80% long queries (> 80 ms) identified– 0.6% short queries mispredicted– 0.55 ms for prediction time with low memory overhead

Page 23: Predictive Parallelization: Taming Tail Latencies  in Web Search

• Key idea– Parallelize only long queries

• Use a threshold on predicted execution time

• Evaluation– Compare Predictive to other baselines

• Sequential• Fixed• Adaptive

Predictive Parallelism

Page 24: Predictive Parallelization: Taming Tail Latencies  in Web Search

26

99%tile Response Time

• Outperforms “Parallelize all”

50

100

150

200

Sequential Degree=3

Predictive Adaptive

Query Arrival Rate (QPS)

Resp

onse

Tim

e (m

s)

50% throughput increase

Page 25: Predictive Parallelization: Taming Tail Latencies  in Web Search

29

Related Work

• Search query parallelism– Fixed parallelization [Frachtenberg, WWWJ 09]– Adaptive parallelization using system load only [Raman et al., PLDI 11] High overhead due to parallelizing all queries

• Execution time prediction– Keyword-specific features only [Macdonald et al., SIGIR 12]→ Lower accuracy and high memory overhead for our target problem

Page 26: Predictive Parallelization: Taming Tail Latencies  in Web Search

Your query to Bing is now parallelized if predicted as long.

Thank You!

query Execution time

predictor

Resourcemanager

short

long