Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IBM offers four distinct data retrieval technologies: the traditional RDBMS, which primarily relies on indexes to speed access; the new BLU Acceleration columnar compression database; IBM PureData System for Analytics (IBM Netezza), which deploys racks of Field Programmable Gate Array (FPGA) processors to parse the data; and IBM InfoSphere BigInsights, which is the IBM distribution of Hadoop. Choices are good to have, but how do you choose which technology to apply to a particular business use case? In this session, you learn how these techniques differ, including their relative strengths and weaknesses, to help you make an informed choicehelp you make an informed choice.
1
2
3
4
5
6
http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.perf.doc%2Fdoc%2Fc0005424.html
7
8
9
10
11
12
Netezza_under_the_hood Feinsmith (page 8)
What we see here is how we increased the scan speeds. Basically about each drive in the N2001 is capable of delivering about 130 megabytes per second of throughput (compared to approximately 120 MB/sec in the N1001).
Wh t h d b f ith th 1 t 1 t 1 ti d i t FPGA t CPUWhat we had before with the 1 to 1 to 1 ratio, one drive to one FPGA core to one CPU core, the speed of the drive was the limiting factor as far as how fast the FPGA core could process the data - because the FPGA core could handle way more than that and the CPU core even more than the FPGA core. So the speed of the drive was a limiting factor.
So we now have more than one drive per FPGA core and per CPU core. Using basic math, h b t 2 1/2 d i FPGA d CPU S th t d i th dwe have about 2 1/2 drives per FPGA core and per CPU core. So that drives up the speed
that data can be scanned and delivered to the FPGA for processing up to about 325 megabytes per second. If you add in the 4x compression that’s going to get up to around 1300 megabytes per second.
IN the N2001 we now have faster FPGA cores that can process about 1000 megabytes per d S l I/O t dsecond. So we are no longer I/O starved.
So we’re now delivering both 2 1/2 times as much data, you know, per second to the FPGA , to the CPU core and that is how we fundamentally and how we increased the scan speed and how we increase the performance of this system.
13
14
Netezza Bootcamp (page 62).
15
Netezza Bootcamp (page 133).
16
Netezza Bootcamp (page 227).
17
Netezza Bootcamp (page 228).
18
19
IZAS_zEnterprise_Analytics Favero, et al (page 21)
20
Favero, et al (page 69)
21
22
23
Schiefer (page 4)
24
Schiefer (page 10)
25
Schiefer (page 16)
26
Schiefer (page 14)
27
Schiefer (page 18)
28
29
30
Positioning guidelines between Netezza and BLU are stated in slide 30, and I believe an additional factor is database size. Given that BLU is a single‐node solution, it is unlikely a DW bigger than 10TB will fit in RAM. There are tables that can yield 10X compression savings, but in practice it is likely a user will see a lower compression ratio. A server with 1TB of RAM is one of larger configurations for BLU, and DW bigger than 10TB will probably not fit. Although BLU does not require all data resided in RAM, performance will degrade if swapping occurs. – Nin Lei
31
32
33
DW611 (page 2‐7)
34
DW611 (chapter 3)
35
DW611 (chapter 3)
36
DW611 (chapter 3)
37
DW611 (chapter 3)
Regarding slide 37, both DPF and MapReduce (MR) use a shared nothing architecture. As such, both architectures reduce data movement by having the threads or map tasks primarily access data from the local node. Certainly there will be data distribution in some queries, but both programming models attempt to minimize data movement. One of the issues with MR is that it takes 10 to 20 seconds to spawn MR jobs. Queries that take a second or two in DFP will take much longer in MR That's the reason most vendors (IBMsecond or two in DFP will take much longer in MR. That s the reason most vendors (IBM BigSQL, Cloudera Impala, EMC Hawk, Hortonworks Stinger) all abandon MR and implement their own data distribution mechanism. – Nin Lei
38
39
40
41
42
Frank Fillmore is the Founder and President of The Fillmore Group, Inc. (TFG), a Premier IBM Business Partner specializing in zAnalytics.
Since 1987, The Fillmore Group has delivered technical services to clients worldwide including government, commercial, and not‐for‐profit enterprises.
A knowledgeable and engaging speaker Frank has presented at many regional and nationalA knowledgeable and engaging speaker, Frank has presented at many regional and national events. In 1998 he became a DB2 Gold Consultant, and in 2009 was named an inaugural InfoSphere Information Champion.
Frank’s core areas of competency include replication, federation, and data interoperability, InfoSphere, and technical Project Management. Frank oversees a staff of DB2 consultants d t ib t hi t h i l ti t Th Fill G ’ i ft land contributes his technical expertise to The Fillmore Group’s growing software sales
business.
43