Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine Learning

Predicting Multiple Metrics for Queries: Better Decision Enabled by

Machine LearningBy: Archana Ganapathi, Harumi Kuno,

Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, David

Patterson

Problem:

• Predicting the performance(running time, resource usage) of a query before executing it will help us in:

• Work load management • query scheduling

• System sizing • requirement for a system to reply a query with time

constraint

• Capacity planning• Given an expected workload, does system require upgarde?

Why it is a hard problem

• Sources of uncertainty• Skewed data distribution• Inaccurate cardinality prediction

• Complex query plans• Huge amount of data• Different schemas for different databases

makes using ML a big challenge

Solution • It should be able to simultaneously predict all

performance metrics, using information available prior to query execution for short and long running queries.

• Potential candidates:• Cost models

– Manually model performance output of each operator for each configuration setting to estimate final value based on query plan.

– Estimation error propagationMachine learning

– Build model based on training data– Not sensitive to estimation error since it is working based on similarity.

Experiment set up (data)

• Machine used to gather training and test query performance metrics

• Hp neoview database system.• Machines with 4,8,16,32 processor.• Fixed memory allocated per CPU.• Each CPU has its own disk and data is partitioned

roughly equally across all 4 disks.

Experiment set up (query)• Categorize queries by runtime:

• 0min < feather < 3min• 3min < golfball < 30min• 30min < bowlingball < 2h.

• Standard decision support benchmark TPC-DS templates to generate queries for feathers.

• Write new templates from real queries that took at least 4 hours to compute for longer queries.

• Some feathers queries from another database with different schema in train and test set.

• Producing queries with appropriate performance was a hard and time-consuming task since changing a constant might turn a feather to bowling ball or vice versa.

Independent modelling of performance metrics• Regression

• Individually model each performance metric y=A1X1+A2X2+…+AnXn

• Regression use different set of features for different performance metrics which will make it hard to unify all performance metrics in one model.

Joint modelling of performance metrics

• Clustering cluster entries of a single dataset based on their similarity.

• PCA Project dataset over dimensions with maximal variance for clustering.

• (K)CCA finds Dimensions of maximal correlations among pairs of datasets and Map each dataset on those dimensions. Notion of similarity can be defined by user in a kernel function.

Query features before running

Performance features after running

• We are given N queries.• We produce two N*N matrix of similarities

among query features and query performance features .

Prediction using KCCA

Evaluation

• Predictive risk

• predictive risk ~ 1 near prefect prediction• This metric is very sensitive to outliers an

removing top outliers can significantly improve predictive risk.

Performance feature vector

• Performance features : 6 measures computed by DBMS after running a query.

– Elapsed time– Disk i/o– Message count– Message bytes– Records accessed– Records used

Query feature vector

• Information available prior to query execution1. SQL text of query• Number of nested sub-queries• Total number of selection predicates• Number of equality selection predicates• Total number of join predicates• Number of equi-join predicates• Number of non-equi-join predicates• Number of sort columns• Number of aggregation columns

Query feature vector

2. query execution plan(a tree of query operators with estimated cardinalities)

• Instance count and cardinality sum for each operator.

Prediction based on neighbours• How to find ‘nearest’ neighbour?

• Euclidian distance captures magnitude-wise closest neighbour.• Cosine distance captures direction-wise closest neighbour.• Experiments suggest that Euclidian distance is providing better

prediction.

Prediction based on neighbours

• How many neighbours to consider when calculating freshness?

• According to experiments done, 3 nearest neighbour is providing a good trade-off.

Prediction based on neighbours

• How to map from neighbours performance metrics to test query performance metric? combine neighbours performance feature vectors.

Equally weighted• 1:2:3 weighted based on distance ranking• Weighting proportinal to distance from test query feature vector

Experiment design

• Experiment 1: Train model with realistic mix of query types-1027(30b+230g+767f)

• Experiment 2 : Train model with 30 queries of each type-120(30b+30g+30f)

• Experiment 3 : 2-step prediction with query type-specific models

• Experiment 4 : Training and testing on queries using different data tables and schemas.

Experiment 1- Time

Experiment 1- Record usage

Experiment 1- Message count

Experiment 2- Time

Experiment 3- Time

Experiment 4- Time

Conclusions

• Predict performance metrics using information available before executing query using ML.

• Prediction can greatly improve system sizing, capacity planning and workload management.

• I want to predict the percentage of up-to-date result for a query result extracted from cache and based on similar queries statistics.

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine Learning

Education

Predicting Latent Structured Intents from Shopping Queriescywu/ · In online shopping, users usually express their intent through search queries. However, these queries are often

Sbo41sp2_Web Intelligence Queries Based on BEx Queries

Predicting the future of Mobile-enabled coMMunity ServiceS€¦ · • The total addressable market for mobile-enabled energy access is more than 643 million people in 2013 or 53%

All our Microfinance suite is Mobile Enabled Web enabled PDA enabled Geography cognizant Voice enabled IVR enabled User cognizant Smart card support Enabler

Queries & Reports - Admin Portal Index · •Disable Pop-up blockers •Position Funding Queries •Employee Information Queries •Budget Queries •Reports ... • You need the

Building SAP BusinessObjects Web Intelligence queries ... · 1 Building queries on BEx queries BEx queries (SAP Business Explorer queries) are queries created using the SAP BEx Query

Demonstrating AI-enabled SQL Queries over Relational Data ... › kdd2018 › files › deep-learning... · entries from a relational database are converted to text tokens rep-resenting

Predicting academic success in higher education: literature review … · 2020-02-10 · tems created a large number of educational databases, which enabled the application of data

Running head: PREDICTING ACADEMIC SUCCESS PREDICTING

INTRODUCTION TO PEOPLESOFT QUERY. AGENDA Overview PeopleSoft Query Running Queries Writing Queries Advanced Topics –Multiple Table Queries –Prompted Queries

Predicting the Effectiveness of Keyword Queries on Databasesvagelis/publications/QueryPerformance... · 2012. 8. 12. · Keyword query interfaces (KQIs) for databases provide easy

Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to…

Plsql Queries

HeatWave User GuideHeatWave is a distributed, scalable, shared-nothing, in-memory, columnar, query processing engine designed for fast execution of analytic queries. It is enabled

Queries Ms. Jaimie Barbé. Queries DoDAAC and UIC Queries in AESIP

Student’s Copy€¦ · 7.The most common kind of queries in a data warehouse are (a) inside-out queries ( ) (b)outside-in queries ( ) (c)browse queries ( ) (d)range queries ( )

EDF2014: Talk of Vassileios Tsetsos, Chief Technical Officer, Mobics Ltd: Predicting parking occupancy in a sensor-enabled smart city

Audio-based Queries for Video Retr ieval over Java …muvis.cs.tut.fi/Documents/SPIE_05_06.pdfAudio-based Queries for Video Retr ieval over Java Enabled Mobile Devices Iftikhar Ahmad

Hoofdstuk 8: Werken met SQL voor eindgebruikers. Overzicht Inleiding op SQL: de select-instructie Eenvoudige queries Join queries Nested queries Queries

AbstractHow do children reformulate their search queries?reformulated queries by using previous queries, correcting errors, and making their queries more specific. The type of query