Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BIGTABLE: A DISTRIBUTED
STORAGE SYSTEM FOR
STRUCTURED DATA
Presenter: Qiping Wei1
Introduction
2
Bigtable is a distributed storage system for managing structured data.– designed to scale to petabytes of data
– many projects at Google store data in it• web indexing, Google earth and Google finance,...
• different data sizes: URL, web pages, satellite imagery,...
• different latency requirements: backend bulk processing to real time data serving
– Provide a flexible high performance solution
– Rely on Chubby lock service
– Use Google file system to store data items
3
Row key
timestamp
column key
Property: reads of short row ranges are efficient
4
Row key value
AA … Value
AZ … Value
BA … Value
BU … Value
CB … Value
CM … Value…
……
……
Row range
tablet
A row rangeUnit of distribution
g
Good locality: obtained by placing access-related data together.
Answer: select row keys in the way that keys can be grouped together.
Eg. two row keys: apple, banana
revised row keys: fruit.apple, fruit.banana
5
g
• get a lot of useful data by doing only one read request.
• Reduce the number of read requests.
• Experience high speed of read operations.
6
Architecture• Bigtable has three major components:
– One master server• Assigns tablets to tablet servers
• Detects addition and expiration of tablet servers
• Balances the load of tablet servers
• Collect garbage in GFS
• Handles schema changes
– Many tablet servers• Manage a set of tablets
• Handle read/write requests from clients
• Store data in GFS
– A library linked to every client• Communicate tablet servers directly to reads and writes
7
8
Tabletserver
Tabletserver
Tabletserver
One Master
Bigtable CientLibrary ……
read/write request
•Balance load•Handle schema changes•Collect garbage
• Assign tablets • Detect addition & expiration of tablet servers
Manage a set of tabletsHandle read/write requestStore data in GFS
Data items: either in log files or in SSTables.
GFS: provide data reliability
By having multiple replicas
Of data.
9
SSTablesLog files
GFS
memtable
Mem
immutable: not allowed to modify the data.
This feature has many benefits:
• simplify various parts of Bigtable
eg. cache maintaining is easy;concurrency control implementation is efficient
• split tablet quickly
• Restore data 10
Assume that the KV item exists, start searching from memtable, then SSTables from low level to high level until find it.
Here are the steps:
• Check in-memory index first
• Find the appropriate block
• Check Bloom Filter to see if the KV item is there.
• If yes, read the block from disk and get the value.
• If no, continue the above steps until find the block and get the value.
11
Benefits resulting from sorting the key:
• support range search
• Reduce index size
12
• these updates: updates committed to the commit log.
• How to do an update?
• based on write operation
• depends on the manner of searching key
from top to bottom
• chooses the latest key 13
• Why exist?
– Limited memtable size
– Immutable SSTables
– Multiple versions allowed
• Why exist in multiple SSTables?
– SSTables from different levels can have overlap ranges
14
• Minor compaction: converting the memtable to a new SSTable.
• Major compaction: – turning multiple SSTables to a new large SSTable.
– No deletion information or deleted data
15
GFS
Memtable
SSTable
SSTable…
Mem
SSTable
Major compaction
Minor compaction
Contributions from major compaction:
• bound the number of SSTables
• reclaim resources used by deleted data
• Remove overlapped ranges to support range search
16
GFS
Read OpMemtable
SSTable
SSTable…
Mem
SSTable
17
4. Major CMPTGFS
Delete Op
Memtable
SSTable
SSTable
…
Mem
SSTable
Commit log
3.Minor CMPT
1. write a special deletion record
2. Insert record
Conclusion• Bigtable is a distributed multi-dimensional sorted
map indexed by a row key, a column key and a timestamp.
• The sorted feature has many benefits:– support range search– reduce index size– support read/write operations– allow to manipulate row keys to get good locality
• Bigtable provides high data reliability through GFS.
• The immutability and compaction of SSTAblessimplify and improve the performance of Bigtable.
18
Reference
• F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber. Bigtable: A Distributed Storage System for Structured Data. OSDI, 2006
• Lecture: the Google Bigtable. https://www.slideshare.net/romain_jacotin/undestand-google-bigtable-is-as-easy-as-playing-lego-bricks-lecture-by-romain-jacotinhe. October,2014
19