View
9
Download
0
Category
Preview:
Citation preview
BIGTABLENirmesh Malviya
nirmesh@csail.mit.edu
April 30, 2012
Content and Figure Credits
● http://research.yahoo.com/files/6DeanGoogle.pdf
● http://research.cs.wisc.
edu/areas/os/Seminar/schedules/archive/bigtable.ppt
● http://www-scf.usc.
edu/~csci572/2011Spring/presentations/Taheriyan.pptx
4/30/2012 6.830 - Spring 2012
What is Bigtable?
● Distributed storage system.
● Manages structured data via a simple data model.
● Scalable.
● Self-managing.
4/30/2012 6.830 - Spring 2012
Why did Google not use a relational database?
● Google has LOTS of data.
● No commercial system big enough.● Too expensive even if there was one.
● Don’t have end-to-end control.● Low-level storage optimizations difficult.
4/30/2012 6.830 - Spring 2012
Data model: sparse map
●Sparse multi-dimensional map.●Indexed by <Row, Column, Timestamp> key.●Values are uninterpreted bytes.
●Distributed, persistent and sorted.
●Essentially a column-oriented physical store.
4/30/2012 6.830 - Spring 2012
Data model: column families
●Arbitrary number of columns on a row-by-row basis.●Column families:
●Columns of same type.●Access control.
●Family:qualifier
4/30/2012 6.830 - Spring 2012
Data model: example
4/30/2012 6.830 - Spring 2012
Data model is not relational
●Writes to a row atomic.●No multirow transactions.
●No table-wide integrity constraints.
4/30/2012 6.830 - Spring 2012
API
● Writes (atomic)● Set(): write cells in a row.● DeleteCells(): delete cells in a row.● DeleteRow(): delete all cells in a row.
● Reads.
● Metadata operations.● Create/delete tables, column families, change
metadata.
4/30/2012 6.830 - Spring 2012
API Example: Write/Modify
atomic row modification
4/30/2012 6.830 - Spring 2012
API Example: Read
Return sets can be filtered using regular expressions:
anchor: com.cnn.*4/30/2012 6.830 - Spring 2012
GFS (now Colossus)
● Large-scale distributed filesystem.
● Master: responsible for metadata.
● Chunk servers: responsible for reading and writing large chunks of data.● Chunks replicated on 3 machines.
4/30/2012 6.830 - Spring 2012
Google File System (GFS)
Data transfers happen directly between clients/chunkservers.
Rep
licas
MasterGFS Master
GFS Master Client
Client
C1C0 C0
C3 C3C4
C1
C5
C3
C4
4/30/2012 6.830 - Spring 2012
SSTable● Immutable, sorted file of key-value
pairs.
● Chunks of data plus an index.● Index is of block ranges, not values.
Index
64K block
64K block
64K block
SSTable
4/30/2012 6.830 - Spring 2012
Tablet
● Contains some range of rows of the table.● Built out of multiple SSTables.
Index
64K block
64K block
64K block
SSTable
Index
64K block
64K block
64K block
SSTable
Tablet Start:aardvark End:apple
4/30/2012 6.830 - Spring 2012
Table● Multiple tablets make up a table.● SSTables can be shared.● Tablets do not overlap, SSTables can overlap.
SSTable SSTable SSTable SSTable
Tablet
aardvark appleTablet
apple_two_E boat
4/30/2012 6.830 - Spring 2012
Locality Groups
● Group column families together into an SSTable.● Can keep some groups all in memory.
● Can compress locality groups.
● Bloom Filters on locality groups – avoid searching SSTable.
4/30/2012 6.830 - Spring 2012
Bigtable: Building blocks
● Scheduler.
● GFS.
● Chubby Lock service.
● Mapreduce helpful but not required.
4/30/2012 6.830 - Spring 2012
Typical Cluster
Cluster Scheduling Master Lock Service GFS Master
Machine 1
Scheduler
Slave
GFSChunk Server
Linux
UserTask
Machine 2
Scheduler
Slave
GFSChunk Server
Linux
UserTask
Machine 3
Scheduler
Slave
GFSChunk Server
Linux
Single Task
BigTableServer
BigTableServer BigTable Master
4/30/2012 6.830 - Spring 2012
Chubby
● {lock/file/name} service.
● Coarse-grained locks, can store small amount of data in a lock.
● 5 replicas, need a majority vote to be
active.
4/30/2012 6.830 - Spring 2012
Finding a tablet
● Tablets move around from server to server.
● Given a row, how do clients find the right
machine?● Tablet property – startrowindex and
endrowindex.
● Instead: store special tables containing tablet location info in Bigtable cell itself.
4/30/2012 6.830 - Spring 2012
Finding a tablet
4/30/2012 6.830 - Spring 2012
Tablet Server
● Manages tablets, multiple tablets per server.
● Each tablet is 100-200MB.● lives on only one server.
● Tablet server splits tablets that get too big.
4/30/2012 6.830 - Spring 2012
Tablet Server startup
● On startup, creates and acquires an exclusive
lock on uniquely named file in Chubby directory.
● Tablet server stops serving its tables if its loses its exclusive lock.
4/30/2012 6.830 - Spring 2012
Bigtable Master
● Responsible for load balancing and fault tolerance.
● Use Chubby to monitor health of tablet servers,
restart failed servers.● If Chubby session expires, master kills itself.
● Preferably start tablet server on same machine
that the data is already at.
4/30/2012 6.830 - Spring 2012
Master Startup
● Grabs unique master lock in Chubby.● Prevents multi-instantiations.
● Scans directory in Chubby for live servers, communicates with every live tablet server.
● Scans METADATA table to learn the set of
tablets. 4/30/2012 6.830 - Spring 2012
Bigtable Master
● Master monitors Chubby directory to discover tablet servers.
● Master is responsible for finding when tablet
server is no longer serving its tablets.● Detects by checking periodically the status of
the lock of each tablet server.
4/30/2012 6.830 - Spring 2012
Writing to a table
● Mutations are logged, then applied to an in-memory version.
● Logfile stored in GFS.
SSTable SSTable
Tablet
apple_two_E boat
Insert
Insert
Delete
Insert
Delete
Insert
Memtable
4/30/2012 6.830 - Spring 2012
Tablet Compactions
● Minor compaction.● Reduce memory usage.● Reduce log traffic on restart.
● Merging compaction.
● Major compaction.● No deletion records, only live data.
4/30/2012 6.830 - Spring 2012
BigTable System Architecture
Read/write
Cluster Scheduling Master
handles failover, monitoring
GFS
holds tablet data, logs
Lock service
holds metadata,handles master-election
Bigtable tablet server
serves data
Bigtable tablet server
serves data
Bigtable tablet server
serves data
Bigtable master
performs metadata ops,load balancing
Bigtable cell Bigtable clientBigtable client
library
Open()
Metadata ops
4/30/2012 6.830 - Spring 2012
Bigtable in real-world
● Bigtable is closed-source and owned by Google.
● Apache Hbase is open-source
implementation.● Famous user: Facebook messaging platform.
4/30/2012 6.830 - Spring 2012
Recommended