Upload
henry-goodwin
View
213
Download
0
Embed Size (px)
Citation preview
C-Store: Column-Oriented Data Warehousing
Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYMay 17, 2010
C-Store’s Father:Michael Stonebraker A former Professor at Berkeley, an Adjunct Professor at M.I.T. ACM Software System Award, 1988
INGRES, developed by undergraduates POSTGRES, Mariposa, C-Store
ACM SIGMOD Innovation Award, 1994 National Academy of Engineering , 1998
C-Store: The Home Pagehttp://db.lcs.mit.edu/projects/cstore/ C-Store: A Column-Oriented DBMS download-Source code overview-Project description papers-Publications people-Who are we?
The CStore project is a collaboration between MIT, Yale, Brandeis University. Brown University, and UMass Boston .
Commercialized C-Store: Vertica
The Starting Point
C-Store: A Column Oriented DBMS
Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik.
VLDB, pages 553-564, 2005.
C-Store: the Column Store Project Row Store or Column Store ?
Record 1
Record 2 Column 1 Column 2
Record 3
Column 3
Relation or Tables
Example of a Relation
The History: Relational Model Codd, E.F. (1970). "A Relational
Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.
Physical Data Independence Row Store Vs. Column Store on the
same Conceptual Model: Relation
Row Store: Why?
OLTP (On-Line Transaction Processing) ATM, POS in supermarkets
Characteristics of OLTP applications : Transactions that involve small numbers of
records (or tuples) Frequent updates (including queries) Many users Fast response times
OLTP Needs Write-Optimized Row Store. Insert and delete a record in one physical write.
Row Store: Columns Stored Together
• Record id = <page id, slot #>
Page iRid = (i,N)
Rid = (i,2)
Rid = (i,1)
Pointerto startof freespaceSLOT DIRECTORY
N . . . 2 120 16 24 N
# slotsSlot Array
Data
Current DBMS Gold StandardCurrent DBMS Gold Standard
Store Columns in one record contiguously on disk
Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer
and executor (technology from 1979) Aries-style transactions
From OLTP to OLAP and Data Warehouse OLAP (On-Line Analytical Processing, Codd,
1993) Flexible Reporting for Business Intelligence
Characteristics of OLAP applications : Transactions that involve large numbers of records Frequent Ad-hoc queries and Infrequent updates A few decision making users Fast response times
Data warehouses are designed to facilitate reporting and analysis. Read-Mostly
Other Read-Mostly Applications CRM (Customer Relationship Management )
Siebel (Oracle)
Catalog Search in Electronic Commerce Amazon.com Shopping.com
Column Store: Why?
The Intuition: Only read relevant columns Say, Ad-hoc queries read 2 columns out of 20
Column Store is not a new idea Sybase IQ (early ’90s, bitmap index) Addamark (i.e., SenSage, for Event Log data
warehouse) MonetDB (Hyper-Pipelining Query Execution,
CIDR’05)
C-Store Technical IdeasC-Store Technical Ideas
Logical Data Model: Relational Model Column Store Only Materialized Views on Each Relation (perhaps
many) Active Data Compression Column-Oriented Query Executor and Optimizer Shared Nothing Architecture Replication-Based Concurrency Control and
Recovery
How to Evaluate The C-Store Paper None of the ideas in isolation merit
publication
Judge the complete system by its (hopefully intelligent) choice of Small collection of inter-related powerful ideas That together put performance in a new sandbox
Architecture of C-Store (Vertica)On a Single Node
C-Store code base version 0.2 http://db.lcs.mit.edu/projects/cstore/
cstore0.2.tar.gz runs on Linux x86 computers
Tested on RedHat Linux This code compiles on old versions
BerkeleyDB and gcc. BerkeleyDB.4.2
LZO version 1 (http://www.oberhumer.com/opensource/lzo/)
References
Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005.
VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER. http://www.vertica.com/php/pdfgateway?file=VerticaArchitectureWhitePaper.pdf
http://www.sensage.com/English/Products/Event_Data_Warehouse.html