Upload
hyoungjun-kim
View
2.116
Download
0
Embed Size (px)
DESCRIPTION
Neptune is Distributed Large scale Structured Data Storage, and open source project implementing Google's Bigtable.
Citation preview
Neptune
hjkim
http://www.openneptune.com
http://dev.naver.com/projects/neptune(korean)
Neptune
Distributed Data Storagesemi-structured data store(not file system)
Use Distributed File System for Data file
Supports real time and batch processing
Google Bigtable cloneData Model, Architecture, Features
Open sourcehttp://dev.naver.com/projects/neptune(korean)
http://www.openneptune.com
Goal500 nodes
200 GB 이상/per node, Peta bytes
Features
Schema ManagementCreate, drop, modify table schema
Real-time TransactionSingle row operation(no join, group by, order by)
Multi row operation: like, between
Batch TransactionScanner, Direct Uploader, MapReduce Adapter
ScalabilityAutomatic table split & re-assignment
ReliabilityData file stored in Distributed File System
Commit log stored in ChangeLog Cluster
FailoverTablet takeover time: max 1 min.
UtilityWeb Console, Shell(simple query), Data Verifier
Architecture
Distributed File System(Hadoop or other)
TabletServer #1 TabletServer #2 TabletServer #n
Neptune
Master
MapReduce
User Application
Neptune
Table
Physical storage
Components
DFS #1
(DataNode)
Computing #1
(Map&Reduce)
TabletServer #1
(Neptune)
Local disk
DFS #2
(DataNode)
Computing #2
(Map&Reduce)
TabletServer #2
(Neptune)
Local disk
DFS #n
(DataNode)
Computing #n
(Map&Reduce)
TabletServer #n
(Neptune)
Local disk
Master
Neptune Master
Neptune Client
NTable Scanner
Lock Server
NChubbyfailover/ event
Pleidas
failover/ event
ZooKeeperNeptune Master
Data/ControlControl
Shell
LogServer
#1
LogServer
#2
LogServer
#n
Data Model
row #n
row #m+1
row #m
row #k+1
row #k
row #1
Rowkey Column#1
TabletA-1 ck-1 v1, t1rk-1v2, t2
ck-2 v3, t2
v4, t3
v5, t4
ck-n vn, tn
- Sorted by rowKey
- Sorted by columnKey
Column#n
TabletA-2
TabletA-n
Table
Row#1
Cell1
Column1
Cell2
Cell3
Cell-n
Cell1
Cell2
Cell-k
Column2
Cell1
Cell2
Cell-m
Column-n
Row.Key
…… …
Cell
Cell.Key
Cell.Value(t1)
Cell.Value(t2)
Cell.Value(tn)
…
Index
M.T1.1000:M1 … Max:mn
T1.100:U1 … xx xx … n
10 20 … 100 110 120 … 200
TableMapFile(Physical file,sorted by rowkey, columnkey)
64KBscan
index of User Tablet
…
…
Index of TableMapFile’s block(max-key, file-offset)
index of Meta Index
Root Index
Meta Tablet m1 m2 mn
T1.200:U2 T1.1000:UN
User defined Tablet U1 U2
… T1.2000:UN
M.T1.2000:M2
Index Record format: . Key - TableName.MaxRowKey
. Value – Tablet Name, assigned host
T1.1100:U1 T1.1200:U2
Data/index file in HDFS
TabletA
TabletB
Column Data/index file
TabletServer
MemoryTableChangeLogServer
Data Operation
MapFile#2
(HDFS)
put(key, value)
ChangeLog
MapFile#1
(HDFS)
MapFile #n
(HDFS)
Minor
Compaction
Merged
MapFile
(HDFS)
Major Compaction
Searcherget(key)
Failover
Master fail
disabled only Table Schema Management
and Tablet Split
can execute Multi-Master
TabletServer fail
assign to other TabletServer by master
within 2 minutes
MapReduce with Neptune
Tablet A-3
Tablet A-N
…
Tablet A-2
TabletA-1
TableA
META Table
Map
Task
TaskTracker
Map
TaskMap
Task
Map
Task
TaskTracker
Map
TaskMap
Task
Map
Task
TaskTracker
Map
TaskMap
Task
TaskTracker
Reduce
Task
TaskTracker
Reduce
Task
TableB
Tablet B-2
Tablet B-1
Partitioned
by key
DBMS
or HDFS
Table
tInputF
orm
at
Hadoop
Client
Client API
Single row operation: put/get
Multi row operation: like, between
Batch operation: scanner/uploader
MapReduce: TabletInputFormat
Command line Shell
NQL(Neptune Query Language)• JDBC support
Web Console
Client API Example
TableShema tableSchema =
new TableSchema(“T_TEST”, new String[]{“col1”, “col2”});
NTable.createTable(tableSchema);
NTable ntable = Ntable.openTable(“T_TEST”);
Row row = new Row(new Row.Key(“RK1”));
Row.addCell(“col1”, new Cell(new Cell.Key(“CK1”), “test_value”.getBytes()));
ntable.put(row);
Row selectedRow = ntable.get(new Row.Key(“RK1”));
System.out.println(selectedRow.getCellList(“col1”).get(0));
TableScanner scanner = ntable.openScanner(ntable, new String[]{“col1”});
Row scanRow = null;
while( (scanRow = scanner.next()) = null) {
System.out.println(selectedRow.getCellList(“col1”).get(0));
}
scanner.close();
Neptune Shell
Data DefinitionCREATE TABLE
DROP TABLE
SHOW TABLES
DESC
Data ManipulationSELECT
DELETE
INSERT
TRUNCATE COLUMN
TRUNCATE TABLE
Cluster MonitoringPING TABLETSERVER
REPORT TABLE
Web Console
Performance
Experiment Neptune
Random read 495
Random write 1,223
Sequential read 498
Sequential write 1,327
Scan 40,329
Number of 1000-byte values read/written per second
HBase/Bigtable Comparison
Neptune Bigtable HBase
File System Hadoop DFS or
other DFS
GFS Hadoop DFS
Computing Hadoop or others MapReduce Hadoop
Master failover Yes(ZooKeeper) Yes(Chubby) 0.20(ZooKeeper)
Script Language No(NQL) Sawzall No
Change log ChangeLog cluster GFS HDFS + Memory
API Java, Thrift, REST C++ Java, Thrift, REST
ACL Yes Yes No
Memory Table No Yes No
Scanner Yes Yes Yes
Uploader Yes Unknown No