Upload
ashley-bell
View
215
Download
0
Embed Size (px)
Citation preview
© 2012 IBM Corporation
Choosing the Best Distribution Key
Eddie BK TanTechnical Account Manager, IBM Asia-Pacific NetezzaTM
[email protected] +65 9877 0221
© 2012 IBM Corporation
Information Management, A Killer Appl
DISTRIBUTE FOR COLLOCATION
– Look at the joins
ORDER DATA
– Look at the WHERE clauses
OPTIMIZE TABLE STRUCTURE
– Performance, space, maintainability
Netezza Performance – the Big Three
© 2012 IBM Corporation
Information Management, A Killer Appl
Network Cost – Rough Estimates
Collocated Tables N/A
Table Redistribute 23 MB / dataslice / sec.
Broadcasted Table 80 MB / sec.dbos
TwinFin 12 = 2.1 GB / sec.
© 2012 IBM Corporation
Information Management, A Killer Appl
SKEW Considerations
Collocated Tables
– None -- except for whatever you started out with
Table Redistribute
– Possible
Broadcasted Table
– None – since all dataslices get an identical copy of the data
– But you don’t want to broadcast large volumes of data
© 2012 IBM Corporation
Information Management, A Killer Appl
CUSTOMER
Cust Id (PK)Cust Name
Transaction Id (PK)Account Id (FK)Transaction TypeTransaction Amount
TRANSACTION
ACCOUNTAccount Id (PK)Cust Id (FK)Product Code (FK)Last Trans DateAmount Balance
is holder of
has activity of
Choosing Distribution Key
Id or Name?
Acct or Cust Id? Or Product Code
Trans or Acct Id?
PRODUCT
Product Code (PK)Product Desc
has
15 mil. rows
50 mil. rows50 rows
108 million rows per month
1.3 billion rows per year
What if there are 10 queries having…
WHERE C.Cust_Id = A.Cust_Id and A.Account_Id = T.Account_Id
and there are 10,000 such queries executed daily?
What if there are 10 queries having…
WHERE C.Cust_Id = A.Cust_Id and A.Account_Id = T.Account_Id
and there are 10,000 such queries executed daily?
Massive Processing Skew because1.3 billion TRANSACTION rows need to be moved around to join the 50-million ACCOUNT table.
Massive Processing Skew because1.3 billion TRANSACTION rows need to be moved around to join the 50-million ACCOUNT table.
The solution is: Add Cust_Id to TRANSACTION tableThe solution is: Add Cust_Id to TRANSACTION table
Cust_IdCust_Id
And change the join-column to Cust_IdAnd change the join-column to Cust_Id
WHERE C.Cust_Id = A.Cust_Id and A.Cust_Id = T.Cust_IdWHERE C.Cust_Id = A.Cust_Id and A.Cust_Id = T.Cust_Id
© 2012 IBM Corporation
Information Management, A Killer Appl
And finally…………
“A good question lights a thousand fires,a good answer merely permits savages to sleep.”
Mike Corbett (21st Century Games Player)
© 2012 IBM Corporation
Information Management, A Killer Appl
Eddie BK Tan, PMP
Technical Account Manager,Netezza Asia PacificIBM Software | Information Mgmt.
The IBM Place9 Changi Business Park Central 1
Singapore 486048
Tel +65 9877 [email protected]
We don’t have enough time to dive into the deeper portions of this pond, but feel free to contact me:We don’t have enough time to dive into the deeper portions of this pond, but feel free to contact me:
In closing…