7
© 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM [email protected] +65 9877 0221

© 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM [email protected]@sg.ibm.com

Embed Size (px)

Citation preview

Page 1: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Choosing the Best Distribution Key

Eddie BK TanTechnical Account Manager, IBM Asia-Pacific NetezzaTM

[email protected] +65 9877 0221

Page 2: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

DISTRIBUTE FOR COLLOCATION

– Look at the joins

ORDER DATA

– Look at the WHERE clauses

OPTIMIZE TABLE STRUCTURE

– Performance, space, maintainability

Netezza Performance – the Big Three

Page 3: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

Network Cost – Rough Estimates

Collocated Tables N/A

Table Redistribute 23 MB / dataslice / sec.

Broadcasted Table 80 MB / sec.dbos

TwinFin 12 = 2.1 GB / sec.

Page 4: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

SKEW Considerations

Collocated Tables

– None -- except for whatever you started out with

Table Redistribute

– Possible

Broadcasted Table

– None – since all dataslices get an identical copy of the data

– But you don’t want to broadcast large volumes of data

Page 5: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

CUSTOMER

Cust Id (PK)Cust Name

Transaction Id (PK)Account Id (FK)Transaction TypeTransaction Amount

TRANSACTION

ACCOUNTAccount Id (PK)Cust Id (FK)Product Code (FK)Last Trans DateAmount Balance

is holder of

has activity of

Choosing Distribution Key

Id or Name?

Acct or Cust Id? Or Product Code

Trans or Acct Id?

PRODUCT

Product Code (PK)Product Desc

has

15 mil. rows

50 mil. rows50 rows

108 million rows per month

1.3 billion rows per year

What if there are 10 queries having…

WHERE C.Cust_Id = A.Cust_Id and A.Account_Id = T.Account_Id

and there are 10,000 such queries executed daily?

What if there are 10 queries having…

WHERE C.Cust_Id = A.Cust_Id and A.Account_Id = T.Account_Id

and there are 10,000 such queries executed daily?

Massive Processing Skew because1.3 billion TRANSACTION rows need to be moved around to join the 50-million ACCOUNT table.

Massive Processing Skew because1.3 billion TRANSACTION rows need to be moved around to join the 50-million ACCOUNT table.

The solution is: Add Cust_Id to TRANSACTION tableThe solution is: Add Cust_Id to TRANSACTION table

Cust_IdCust_Id

And change the join-column to Cust_IdAnd change the join-column to Cust_Id

WHERE C.Cust_Id = A.Cust_Id and A.Cust_Id = T.Cust_IdWHERE C.Cust_Id = A.Cust_Id and A.Cust_Id = T.Cust_Id

Page 6: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

And finally…………

“A good question lights a thousand fires,a good answer merely permits savages to sleep.”

Mike Corbett (21st Century Games Player)

Page 7: © 2012 IBM Corporation Choosing the Best Distribution Key Eddie BK Tan Technical Account Manager, IBM Asia-Pacific Netezza TM tanbke@sg.ibm.comtanbke@sg.ibm.com

© 2012 IBM Corporation

Information Management, A Killer Appl

Eddie BK Tan, PMP

Technical Account Manager,Netezza Asia PacificIBM Software | Information Mgmt.

The IBM Place9 Changi Business Park Central 1

Singapore 486048

Tel +65 9877 [email protected]

We don’t have enough time to dive into the deeper portions of this pond, but feel free to contact me:We don’t have enough time to dive into the deeper portions of this pond, but feel free to contact me:

In closing…