column stores 2.0prof. Stratos Idreos
HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/
class 5
http://daslab.seas.harvard.edu/classes/cs165/
CS165, Fall 2016 Stratos Idreos /282
what just happened?where is my data?
email, cloud, social media, …
can we design systems that let us know what is going on?
worth thinking about…
CS165, Fall 2016 Stratos Idreos /283
cool papers 2.0
The Case for RodentStore: An Adaptive, Declarative Storage SystemPhilippe Cudré-Mauroux, Eugene Wu, Samuel Madden In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2009
Abstraction Without Regret in Database Systems Building: a ManifestoChristoph KochIEEE Data Eng. Bull. 37(1): 70-79 (2014)
dbTouch: Analytics at your FingertipsStratos Idreos and Erietta Liarou In Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2013
CS165, Fall 2016 Stratos Idreos /284
design doc think, design, create 1-2 page PDF doc and ask for feedback mandatory M1-M3, optional afterwards
submit through Canvas
do not worry about perfection: fail fast wrong ideas ok if you eventually find out they are wrong :) (holds for midterms as well)
CS165, Fall 2016 Stratos Idreos /285
Jim Gray, IBM, Tandem, DEC, Microsoft ACM Turing award ACM SIGMOD Edgar F. Codd Innovations Award
disk100Kx Pluto
2 years
memory100x New York1.5 hours
on board cache10x this building
10 min
on chip cache2x this room
1 min
registers my head~0
CS165, Fall 2016 Stratos Idreos /28
the way we store data defines the possible (efficient) access methods
6
CS165, Fall 2016 Stratos Idreos /287
free_offset, N, offset1-length1, offset2-lenght2,…
free space
slotted page
scan null
update var length
…
CS165, Fall 2016 Stratos Idreos /288
row-store column-storeABC D A B C D
CS165, Fall 2016 Stratos Idreos /289
a1 a2 a3 a4 a5 a6
b1 b2 b3 b4 b5 b6
c1 c2 c3 c4 c5 c6
virtual ids/ positional alignment
positional lookups/joinsA(i) = A + i * width(A)
tuple 1tuple 2tuple 3tuple 4tuple 5tuple 6
A B C
fixed-width + dense
columns do not need to have the
same width
CS165, Fall 2016 Stratos Idreos /28
todaycolumn-stores 2.0
10
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2811
select min(C) from R where A min
sequential access patterns, max 1 if
CS165, Fall 2016 Stratos Idreos /2812
working over fixed width & dense columns
for (i=0;iv inter1[j++]=i
no function calls, no indirections, no auxiliary data, min ifs easy to prefetch next data values
for (i=0;i
CS165, Fall 2016 Stratos Idreos /2813
B
CS165, Fall 2016 Stratos Idreos /2813
B
CS165, Fall 2016 Stratos Idreos /2814
B
CS165, Fall 2016 Stratos Idreos /2815
disk memoryA B C D
A
ABCrow-store
engineearly tuple
reconstruction/materialization
option1
option2
column-store
engine
CS165, Fall 2016 Stratos Idreos /2816
possible data flow patternstuple at a time block/vector at a time column at a time
B
CS165, Fall 2016 Stratos Idreos /2817
select min(C) from R where A
CS165, Fall 2016 Stratos Idreos /2818
CEO/Co-founder of Vectorwise (now Actian) now: “changing the world, one terabyte at a time” co-founder of Snowflake
the beer analogy
Marcin Zukowski, PhD
CS165, Fall 2016 Stratos Idreos /2819
registers
on chip cache
on board cache
memory
disk
CPU
chea
per
fast
erop1 op2
query plan
A B
A Bop3
A
size of vector
CS165, Fall 2016 Stratos Idreos /2820
tuple at a time - good for minimizing memory footprint bulk processing - good minimizing functional overhead
vectorized processing - somewhere in between
CS165, Fall 2016 Stratos Idreos /2821
history/timeline
~1960s
tuple at a time
1980s: ideas about block processing
2005: vectorwise
tuple at a time tuple at a time
>2010: industry adoption
CS165, Fall 2016 Stratos Idreos /28
project: column-at-a-time
bonus: vectorized processing
22
CS165, Fall 2016 Stratos Idreos /2823
update row7=(A=a,B=b,C=c,D=d)
row-store column-storeABCD A B C D
vs
which is better to update and why? how much does it cost to update a single row? (think about pages, data movement) how to update in column-stores? (query plan + algorithms)
CS165, Fall 2016 Stratos Idreos /28
A
24
A B C D
B C D
base data pending updates
updatequery
periodically
CS165, Fall 2016 Stratos Idreos /2825
A B C D
columns copy rows copy
fractured mirrors
ABCD
optimizer
query
A case for fractured mirrorsRavishankar Ramamurthy, David J. DeWitt, Qi Su Very Large Databases Journal, 12(2): 89-101, 2003
CS165, Fall 2016 Stratos Idreos /2826
column-stores great for analytics
row-stores great for transactions
still basic concepts are the same
hybrids possible
keep access patterns sequential
and simple (min ifs)
Notes to remember
CS165, Fall 2016 Stratos Idreos /2827
reading
The Design and Implementation of Modern Column-store Database Systems (Sections: all -4.6 & 4.8)by D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, S. Madden
IEEE Data Engineering Bulletin, 35(1), March 2012 Special Issue on Column-stores (9 short overview papers)
CS165, Fall 2016 Stratos Idreos /2828
research papers
Database Architecture Optimized for the New Bottleneck: Memory Access Peter Boncz, Stefan Manegold, Martin Kersten In Proc. of the Very Large Databases Conference (VLDB), 1999
MonetDB/X100: Hyper-Pipelining Query Execution Peter A. Boncz, Marcin Zukowski, Niels NesIn Proc. of the Inter. Conference on Innovative Data Systems Research (CIDR), 2005Materialization Strategies in a Column-Oriented DBMSDaniel Abadi, Daniel Myers, David DeWitt, Samuel Madden In Proc. of the Inter. Conference on Data Engineering (ICDE), 2007
Self-organizing tuple reconstruction in column-storesStratos Idreos, Martin Kersten, Stefan Manegold In Proc. of the ACM SIGMOD Inter. Conference on Management of Data, 2009
DATA SYSTEMSprof. Stratos Idreos
class 5
column-stores 2.0