Upload
gavin-rice
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
Validating an Access Cost Model for Wide Area Applications
Louiqa Raschid
University of Maryland
CoopIS 2001
Co-authors V. Zadorozhny, T. Zhan and L. Bright
L. Raschid — University of Maryland, CoopIS01
Scalable Wide-Area ApplicationsProblems Wide area environment is dynamic (noisy) Wide variability in latency (end-to-end delay) Network and server workloads are unknown Time and Day dependencies impact latency Dynamic environment - constantly monitored
Research Objective:Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent
L. Raschid — University of Maryland, CoopIS01
Talk Outline Architecture for Wide Area Applications
WebPT: Tool to predict access costs
WebPT based Access Cost Catalog
Grouping of WebSources based on observable WebSource characteristics
Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy
Validation based on experimental case study
L. Raschid — University of Maryland, CoopIS01
Architecture for WebPT based Catalog
L. Raschid — University of Maryland, CoopIS01
Predicting Response Times for Accessing WebSources
Problem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and WebSource unknown
Objective: •Use query feedback to learn access costs•Exploit Time of day, Day of week etc., to predict costs•Identify easily observable WebSource characteristics Determine prediction accuracy for WebSources based on WebSource characteristics
L. Raschid — University of Maryland, CoopIS01
Metrics in WebPT Access Cost Model WebSource and Network Costs
Query Processing at WebSource Downloading data from WebSource (extraction cost)
Wrapper Statistics Number of Pages Accessed Cardinality of Result
Statistics may be dependent on value of query binding WebPT - a tool for learning using query feedback and
predicting access cost based on parameters such as Day, Time, Qty of data , Cardinality, etc.
L. Raschid — University of Maryland, CoopIS01
WebPT Learning
L. Raschid — University of Maryland, CoopIS01
WebPT based Prediction WebPT is configured for some hierarchy of dimensions
Quantity, Day,Time, Cardinality WebPT Learning algorithm
Cell splitting Smoothing Estimate response time and confidence Similar to CART (regression versus heuristics) Cell merging
Heuristics used in calibration of each cell Dimension - min/ max/ scale Allowed deviation Confidence window
L. Raschid — University of Maryland, CoopIS01
Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following:
Observable WebSource Characteristics Significance of Time and Day in predicting
workload at the server and on the network Variance (noise) in accessing server
Quality of available statistics - cardinality Random bindings - large variance in cardinality Fixed bindings - better estimation of cardinality
L. Raschid — University of Maryland, CoopIS01
Case Study: Data gathering and Experiment 6 data sources in the public domain Data gathered for several weeks in 1999, 2000 Queries submitted to WebSources periodically Recorded TTF TTL Query bindings affected result cardinality
Random bindings - >50 bindings Fixed bindings - 2 bindings each for [S,M,L]
Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported)
L. Raschid — University of Maryland, CoopIS01
Observable WebSource Characteristics
L. Raschid — University of Maryland, CoopIS01
Grouping of WebSources based on Characteristics
•G1: T and D significant; Noise can vary•G2: Noise High•G3: T, D not significant; Noise Low - EMPTY
L. Raschid — University of Maryland, CoopIS01
Hypothesis to test WebPT based Access Cost Catalog H1: High prediction Accuracy for the following
T, D, are significant and Low Noise Sources are in G1 but not in G2
H2: Catalog will improve prediction accuracy for the following WebSources T, D are significant independent of noise Group G1
H3: Statistics may be dependent on value of query binding Prediction accuracy improves with learning on fixed bindings Sources in both groups
L. Raschid — University of Maryland, CoopIS01
Prediction Accuracy for WebSources
WebPT(Lo) - Random bindings
L. Raschid — University of Maryland, CoopIS01
WebSource Characteristics and CorrelationWith Prediction Accuracy
L. Raschid — University of Maryland, CoopIS01
Groupings of WebSources and Correlationwith Prediction Accuracy
G1: T and D significantG2: Noise HighGNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy
L. Raschid — University of Maryland, CoopIS01
Quantile Plots of Relative Error of Prediction for ACM, Aircraft
L. Raschid — University of Maryland, CoopIS01
Quantile Plot of Relative Error of Prediction for FAA, GNIS
L. Raschid — University of Maryland, CoopIS01
Summary + Impact Unique Case Study: WebPT based Access Cost
Catalog and Cost distributions Grouping of WebSources based on observable
WebSource characteristics High Prediction Accuracy for some sources in G1 (T,D
significant) with low noise High Prediction Accuracy for some sources in G1 and
in G2 (High Noise) Similar results for Mediator cost model and complex
N-way joins over multiple WebSources