Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Query Processing on Smart SSDs
Opportunities and Challenges Jaeyoung Do1, Yang Seok Ki2, Jignesh M. Patel1, Chanik Park2, Kwanghyun Park1, David J. DeWitt3 1 University of Wisconsin – Madison 2 Samsung Semiconductor Inc. 3 Microsoft Jim Gray Systems Lab
1
Contact: [email protected]
The opinions expressed here are those of the authors, and not necessarily of the organizations to which the authors belong.
Motivation 2
CPU
DRAM
caches
Magnetic Hard Disk Drives
CPU
NVRAM (e.g. SSDs)
Bus
NVRAM (e.g. SSDs)
DRAM
Inside a Modern SSD
• Embedded processors (low power) CPU
• General-purpose memory Memory
• High I/O bandwidth inside the flash chips I/O
3
SSD Controller
Embedded Processor
DRAM Controller
Flash Memory Controller
Flash Memory Controller
Flash Memory Controller
SRAM
Host Interface Controller
DRAM
F F
F F
F F
Flash SSD Device
4
1
10
100
2007 2008 2009 2010 2011 2012 2013 2014 2015
Ban
dwid
th R
elat
ive
to
the
I/O In
terf
ace
Spee
d
Year
I/O Interface
Inside the SSD
5
• Push code through the “straw,” thereby drinking smaller amounts through the straw Philosophically similar to what Netezza and Exadata do today, but using “commodity” resources inside the SSD
The Opportunity
• Higher performance • Lower energy consumption (Joules/query) • Lower cost when building balanced systems • Simpler appliances
Potential Advantages
• Programming tools for the SSD are primitive • Not enough “spare” processing power • Some hardware architectural limitations • Deeper DBMS implications: query optimization,
buffer pool caching, updates, transaction mgmt., etc.
Current Challenges
6
Host I/O Interface
I/O Interface
Aggregate
Scan
Host I/O Interface
I/O Interface
Aggregate
Pre- Aggregate
Scan
Implementation Details
• New Smart SSD APIs to run database operations inside the SSD
• Open(), Get(), Close(), SendToHost()
Samsung Smart SSD
• Modified the query processor to hard-code selected Smart SSD plans
• Implement Smart SSD executors
SQL Server
7
Communication Flow – e.g. Scan 8
Scanning … Get(): Are there results?
NO
Get(): Are there results? YES, results
5μsec
Get(): Are there results
FINISHED Scan is done
Open(): Metadata
OK
SQL Server Smart SSD
Close() OK
9
Data page
A tuple
A value of a column in a tuple
Embedded Processor
DRAM Controller
Flash Memory Controller
Flash Memory Controller
Flash Memory Controller
SRAM Host
Interface Controller
DRAM
F F
F F
F F
Inside the Flash SSD
Data storage
If qualified
Copy projected columns’ values to output buffer
int char int vchar
Data Flow inside the SSD (NSM format)
Storage Formats • NSM layout
• PAX layout
10
Data Page
Tuple int char date int char date
Column
11
Embedded Processor
DRAM Controller
Flash Memory Controller
Flash Memory Controller
Flash Memory Controller
SRAM Host
Interface Controller
DRAM
F F
F F
F F
Inside the Flash SSD
Data storage
Data page
A column
A value of a column in a tuple
If qualified
32 Bytes LDM_CMP function
Copy projected columns’ values to output buffer
int
char
int int int
char char char
Data Flow inside the SSD with the PAX layout
System Configuration 12
SQL server (Host machine) Flash SSD HDD 64-bit Windows 7
32GB DRAM (24GB dedicated to the DBMS)
2.66 GHz Dual Quad Core Intel Xeon5430 32KB L1 cache, 6MB L2 cache
Two 7.5 RPM SATA HDDs one for the OS and one for the Log
LSI four-port SATA/SAS 6Gbps HBA (host bus adapter)
Samsung 400GB SAS SSD Two ARM Cores
16KB SRAM for each core
256KB DRAM for each core
1MB DRAM output buffer
HP 146GB 10K RPM SAS HDD
Upper Bound on Expected Performance Gains
0
500
1000
1500
SAS HDD SAS SSD Smart SSD (Internal)
80
550
1560
Max
imum
Seq
uent
ial R
ead
(MB
/s)
13
20X
Speed of a Simple Sequential Scan Program
Experimental Setup: Synthetic Data 14
Table # Columns
Table Size (GB)
# tuples (millions)
# tuples/page
Synthetic 4 4 10 400 323
Synthetic16 16 30 400 109
Synthetic64 64 110 400 29
Each table has a number of integer-typed columns
Experimental Setup: TPC-H 15
Lineitem ( L_ORDERKEY bigint, L_PARTKEY int, L_SUPPKEY int, L_LINENUMBER int, L_QUANTITY bigint, L_EXTENDEDPRICE bigint, L_DISCOUNT bigint, L_TAX bigint, L_SHIPDATE int, L_COMMITDATE int, L_RECEIPTDATE int, L_RETURNFLAG char(1), L_LINESTATUS char(1), L_SHIPINSTRUCT char(25), L_SHIPMODE char(10), L_COMMENT char(47), )
Modified TPC-H Lineitem Table, 600M tuples, 94GB
10% Selection Query
16
1
10
100
1000
10 100 1000
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
15.5
X
16.0X
SELECT COLUMN_2 FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Selection Query
17
10
100
10 100
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
2.4X
2.3X
SELECT COLUMN_2 FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Selection Query
18
0
5
10
15
0 500 1000 1500
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
SELECT COLUMN_2 FROM Synthetic64 WHERE COLUMN_1 < [VALUE]
23.0
X
16.0X
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Selection Query
19
0
0.5
1
0 50 100 150 200 250
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
SELECT COLUMN_2 FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
1.8X
2.3X
Effect of the # of Tuples on a Page 20
0
50
100
150
200
Synthetic4 Synthetic16 Synthetic64
Tim
e (s
econ
ds)
Target Table
SAS SSD Smart SSD(PAX)
SELECT COLUMN_2, FROM [Synthetic_table] WHERE COLUMN_1 < [VALUE]
Synthetic4: (323 tuples/page) Synthetic16: (109 tuples/page) Synthetic64: (29 tuples/page)
Smart SSD outperforms the regular SSD Smart SSD
underperforms compared to
the regular SSD
10% Aggregation Query
21
1
10
100
1000
10 100 1000
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
16.4
X
16.9X
SELECT AVG(COLUMN_2) FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Aggregation Query
22
10
100
10 100
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
2.6X
2.4X
SELECT AVG(COLUMN_2) FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Aggregation Query
23
0
5
10
15
0 500 1000 1500
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
SELECT AVG(COLUMN_2) FROM Synthetic64 WHERE COLUMN_1 < [VALUE]
20.9
X
16.9X
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
10% Aggregation Query
24
0
0.5
1
0 50 100 150 200 250
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
SELECT AVG(COLUMN_2) FROM Synthetic64 WHERE COLUMN_1 < [VALUE] Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
1.8X
2.4X
TPC-H Query 6
25
10
100
10 100 1000
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
SELECT SUM(EXTENDEDPRICE*DISCOUNT) FROM LINEITEM WHERE SHIPDATE>=1994-‐01-‐01, SHIPDATE<1995-‐01-‐01, DISCOUNT > 0.05, DISCOUNT < 0.07, QUANTITY < 24
11.8
X
11.5X
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
TPC-H Query 6
26
10
100
10 100
Tota
l Ene
rgy
(KJ)
, lo
g sc
ale
Elapsed Time (seconds), log scale
1.9X
1.7X
SELECT SUM(EXTENDEDPRICE*DISCOUNT) FROM LINEITEM WHERE SHIPDATE>=1994-‐01-‐01, SHIPDATE<1995-‐01-‐01, DISCOUNT > 0.05, DISCOUNT < 0.07, QUANTITY < 24
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
TPC-H Query 6
27
0
5
10
15
20
0 200 400 600 800 1000 1200
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
14.1
X
11.5X
SELECT SUM(EXTENDEDPRICE*DISCOUNT) FROM LINEITEM WHERE SHIPDATE>=1994-‐01-‐01, SHIPDATE<1995-‐01-‐01, DISCOUNT > 0.05, DISCOUNT < 0.07, QUANTITY < 24
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
TPC-H Query 6
28
0
0.5
1
1.5
2
0 50 100 150 200
I/O S
ubsy
stem
Ene
rgy
(KJ)
Elapsed Time (seconds)
SELECT SUM(EXTENDEDPRICE*DISCOUNT) FROM LINEITEM WHERE SHIPDATE>=1994-‐01-‐01, SHIPDATE<1995-‐01-‐01, DISCOUNT > 0.05, DISCOUNT < 0.07, QUANTITY < 24
Smart SSD (PAX)
Smart SSD (NSM)
SAS SSD
SAS HDD
1.4X
1.7X
Conclusions and Directions for Future Work • Computing and storage boundaries are no longer as
rigid as they have been in the past • “Commodity” computing in storage devices is here • Communication buses evolve slower than other parts
of the system
Big Picture
• Push code to data rather than pull data to the (main) processor
• Make general-purpose processing a commodity pre-built “into” the device
Opportunity
• Need better programming tools • Various SSD hardware architectural changes needed
(but appear to be feasible, cheaply) • DBMS internals also need significant changes
Challenges
29
Co-dependency
Hardware • More processing
behind the bus • Commodity
processing • Better programming
tools • Long-term roadmap
Software • Caching
implications • Query optimization • Consistency • Concurrency • Storage
optimization
30
• Significant $ and time investment needed on both sides.
• Who is willing to take the long-term view and invest?