Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Optimization of Common Table Expressions in MPP Database Systems
www.pivotal.io
I. Motivation!
Goals: - Cost-based inlining approach - Contextualized optimization - Pushing filters, ordering into CTEs - Deadlock-free execution
Example Query: WITH v as (SELECT i_brand, i_color FROM item WHERE i_current_price < 1000) SELECT v1.* FROM v v1, v v2, v v3 WHERE v1.i_brand = v2.i_brand AND v2.i_brand = v3.i_brand AND v3.i_color = ’red’; Index on i_color
Select i_current_price < 1000
TableScan(item)
v2
v1
v3
Join
Join
Select i_color=‘red’
v
Join
Join
Select i_current_price < 1000
TableScan(item)
Select i_current_price < 1000
TableScan(item) IndexScan(item) i_color=‘red’
Select i_current_price < 1000
Select i_current_price < 1000
TableScan(item)
v2
v1 Join
Join
v
IndexScan(item) i_color=‘red’
Select i_current_price < 1000
No inlining Inline all CTEs Partial Inlining
Challenges: - Deadlock hazard - Inlining heuristics à suboptimal plans - CTEs optimized in isolation from context
II. CTE Representation in Orca!WITH v AS (SELECT i_brand FROM item WHERE i_color = ’red’) SELECT * FROM v as v1, v as v2 WHERE v1.i_brand = v2.i_brand;
Select i_color=‘red’
TableScan(item)
CTEProducer(0) CTEAnchor(0)
Join
CTEConsumer(0) CTEConsumer(0)
Main Query Tree:
CTE Definition:
Select i_color=‘red’
TableScan(item)
Join
Select i_color=‘red’
TableScan(item) Select
i_color=‘red’
TableScan(item)
CTEProducer(0)
Sequence
Join
CTEConsumer(0) CTEConsumer(0) Inline all CTEs
No inlining
Possible alternative plans
III. Plan Enumeration!
Join
CTEAnchor(0) 0:
1:
CTEConsumer(0) 2:
CTEConsumer(0) 3:
Join
CTEAnchor(0)
CTEProducer(0)
Sequence NoOp
TableScan(item)
Select i_color=‘red’
0:
4:
5:
6:
1:
TableScan(item)
Select i_color=‘red’ CTEConsumer(0)
TableScan(item)
Select i_color=‘red’ CTEConsumer(0) 2: 3:
7: 8:
Join
Select i_color =‘red’
Select i_color =‘red’
TableScan (item)
TableScan (item)
Join CTEProducer(0)
Select i_color =‘red’
Select i_color =‘red’
TableScan (item)
TableScan (item)
Sequence
CTEConsumer(0)
Select i_color =‘red’
Join
CTEConsumer(0)
TableScan (item)
Join
CTEConsumer(0) CTEConsumer(0)
Join CTEProducer(0)
Select i_color =‘red’
TableScan (item)
Sequence
CTEConsumer(0) CTEConsumer(0)
Join CTEProducer(0)
Select i_color =‘red’
Select i_color =‘red’
Select i_color =‘red’
TableScan (item)
TableScan (item)
TableScan (item)
Sequence
Main query tree inserted into compressed MEMO structure Each operator initializes a new group in the MEMO
1 Transformation rules are applied to generate equivalent alternatives, which are also added to the MEMO
2 Different combinations of operators can be used together to form different plans Properties are used to ensure that only the valid plans are generated
3
IV. Predicate Push-down !
VII. Experimental Results!
!"#$%!&#$%!'#$%#$%'#$%&#$%"#$%(#$%
)##$%)'#$%
)% )#% )##% )###% )####%
!"#$%&'"
'()*
+,'-./%(*01"'*23'-4*
Performance improvement of Orca vs. Planner
1"
10"
100"
1000"
10000"
01" 02" 04" 05a" 11" 14a" 18" 18a" 22" 22a" 23" 24" 27" 30" 31" 36a" 39" 47" 51" 51a" 58" 59" 64" 67" 67a" 70" 70a" 74" 75" 80a" 83" 86a" 95" 97"
Execu&
on)Tim
e)(sec))
TPC2DS)Queries)
Basic"CTE"op6miza6on" Inline"all"CTEs" Cost>based"inlining"
Experimental setup: • 5TB TPC-DS benchmark • 8-node cluster, CPU: 3.33GHz, RAM: 48GB
• Compare new optimizer (Orca) vs old Planner • Compare Orca using different CTE inlining settings
V. Common Subexpression Elimination!
Join%
Select%count&>&10&
GbAgg%i_brand&
TableScan(item)&
Select%count&>&20&
GbAgg%i_brand&
TableScan(item)&
CTEProducer(0)&
GbAgg%i_brand&
TableScan(item)&
Join%
Select%count&>&10&
CTEConsumer(0)%
Select%count&>&20&
CTEConsumer(0)%
CTEAnchor(0)&
Query&with&common&expressions& Common&expressions&pulled&out&into&CTE&
• CTEs can be used to optimize the performance of queries with repeated expressions
• Similarly, they can be generated by the optimizer for expressions like window functions and distinct aggregates
Join CTEProducer(0)
Select i_current_price < 50
Sequence
TableScan (item)
Select i_color =‘red’
CTEConsumer(0) CTEConsumer(0)
Select i_color=‘blue’
Join
CTEProducer(0)
Select i_current_price < 50
Sequence
TableScan(item)
Select i_color =‘red’
CTEConsumer(0) CTEConsumer(0)
Select i_color=‘blue’
Select i_color =‘red’ OR i_color=‘blue’
VI. Execution in MPP Setting!
• CTEConsumers are instantiated as SharedScans
• CTEProducers are instantiated as Materialize • The Broadcast operator manages data
exchange between the shown two active processes
Performance comparison for different CTE inlining approaches
Hanging queries Slow execution
Cost-based inlining saves up to 55% of execution time
SharedScan(CTEId=0)
Process P2
Sequence
Materialize(CTEId=0) NL Join
Select i_color = ‘red’
TableScan(item)
SharedScan (CTEId=0)
Broadcast Process P1