19
Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

Validating an Access Cost Model for Wide Area Applications

Embed Size (px)

DESCRIPTION

Validating an Access Cost Model for Wide Area Applications. Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright. Scalable Wide-Area Applications. Problems Wide area environment is dynamic (noisy) Wide variability in latency (end-to-end delay) - PowerPoint PPT Presentation

Citation preview

Validating an Access Cost Model for Wide Area Applications

Louiqa Raschid

University of Maryland

CoopIS 2001

Co-authors V. Zadorozhny, T. Zhan and L. Bright

L. Raschid — University of Maryland, CoopIS01

Scalable Wide-Area ApplicationsProblems Wide area environment is dynamic (noisy) Wide variability in latency (end-to-end delay) Network and server workloads are unknown Time and Day dependencies impact latency Dynamic environment - constantly monitored

Research Objective:Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent

L. Raschid — University of Maryland, CoopIS01

Talk Outline Architecture for Wide Area Applications

WebPT: Tool to predict access costs

WebPT based Access Cost Catalog

Grouping of WebSources based on observable WebSource characteristics

Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy

Validation based on experimental case study

L. Raschid — University of Maryland, CoopIS01

Architecture for WebPT based Catalog

L. Raschid — University of Maryland, CoopIS01

Predicting Response Times for Accessing WebSources

Problem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and WebSource unknown

Objective: •Use query feedback to learn access costs•Exploit Time of day, Day of week etc., to predict costs•Identify easily observable WebSource characteristics Determine prediction accuracy for WebSources based on WebSource characteristics

L. Raschid — University of Maryland, CoopIS01

Metrics in WebPT Access Cost Model WebSource and Network Costs

Query Processing at WebSource Downloading data from WebSource (extraction cost)

Wrapper Statistics Number of Pages Accessed Cardinality of Result

Statistics may be dependent on value of query binding WebPT - a tool for learning using query feedback and

predicting access cost based on parameters such as Day, Time, Qty of data , Cardinality, etc.

L. Raschid — University of Maryland, CoopIS01

WebPT Learning

L. Raschid — University of Maryland, CoopIS01

WebPT based Prediction WebPT is configured for some hierarchy of dimensions

Quantity, Day,Time, Cardinality WebPT Learning algorithm

Cell splitting Smoothing Estimate response time and confidence Similar to CART (regression versus heuristics) Cell merging

Heuristics used in calibration of each cell Dimension - min/ max/ scale Allowed deviation Confidence window

L. Raschid — University of Maryland, CoopIS01

Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following:

Observable WebSource Characteristics Significance of Time and Day in predicting

workload at the server and on the network Variance (noise) in accessing server

Quality of available statistics - cardinality Random bindings - large variance in cardinality Fixed bindings - better estimation of cardinality

L. Raschid — University of Maryland, CoopIS01

Case Study: Data gathering and Experiment 6 data sources in the public domain Data gathered for several weeks in 1999, 2000 Queries submitted to WebSources periodically Recorded TTF TTL Query bindings affected result cardinality

Random bindings - >50 bindings Fixed bindings - 2 bindings each for [S,M,L]

Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported)

L. Raschid — University of Maryland, CoopIS01

Observable WebSource Characteristics

L. Raschid — University of Maryland, CoopIS01

Grouping of WebSources based on Characteristics

•G1: T and D significant; Noise can vary•G2: Noise High•G3: T, D not significant; Noise Low - EMPTY

L. Raschid — University of Maryland, CoopIS01

Hypothesis to test WebPT based Access Cost Catalog H1: High prediction Accuracy for the following

T, D, are significant and Low Noise Sources are in G1 but not in G2

H2: Catalog will improve prediction accuracy for the following WebSources T, D are significant independent of noise Group G1

H3: Statistics may be dependent on value of query binding Prediction accuracy improves with learning on fixed bindings Sources in both groups

L. Raschid — University of Maryland, CoopIS01

Prediction Accuracy for WebSources

WebPT(Lo) - Random bindings

L. Raschid — University of Maryland, CoopIS01

WebSource Characteristics and CorrelationWith Prediction Accuracy

L. Raschid — University of Maryland, CoopIS01

Groupings of WebSources and Correlationwith Prediction Accuracy

G1: T and D significantG2: Noise HighGNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy

L. Raschid — University of Maryland, CoopIS01

Quantile Plots of Relative Error of Prediction for ACM, Aircraft

L. Raschid — University of Maryland, CoopIS01

Quantile Plot of Relative Error of Prediction for FAA, GNIS

L. Raschid — University of Maryland, CoopIS01

Summary + Impact Unique Case Study: WebPT based Access Cost

Catalog and Cost distributions Grouping of WebSources based on observable

WebSource characteristics High Prediction Accuracy for some sources in G1 (T,D

significant) with low noise High Prediction Accuracy for some sources in G1 and

in G2 (High Noise) Similar results for Mediator cost model and complex

N-way joins over multiple WebSources