44
1 1 Apache Accumulo Overview Bill Havanki Solutions Architect, Cloudera Government Solutions

Apache Accumulo Overview

Embed Size (px)

DESCRIPTION

An overview of the Apache Accumulo high-performance, scalable, distributed key/value store.

Citation preview

Page 1: Apache Accumulo Overview

11

Apache Accumulo OverviewBill HavankiSolutions Architect, Cloudera Government Solutions

Page 2: Apache Accumulo Overview

2 ©2014 Cloudera, Inc. All rights reserved.2

•Quick History•Storage Model•Loading and Querying•Daemons•Getting Started, a.k.a., the Pitch

Agenda

Page 3: Apache Accumulo Overview

3

A Quick History

3

Page 4: Apache Accumulo Overview

4 ©2014 Cloudera, Inc. All rights reserved.

Google BigTable

Compressed, high-performance, scalable, distributed sorted map

4

Page 5: Apache Accumulo Overview

5 ©2014 Cloudera, Inc. All rights reserved.

Google BigTable

• Began development in 2004• Built on Google File System• Non-relational• Byte-oriented and schemaless• Stores data in the petabyte range• Research paper published in 2006

5

Page 6: Apache Accumulo Overview

6 ©2014 Cloudera, Inc. All rights reserved.

Child(ren) of BigTable

• Apache HBase (begun 2006, top-level 2010)• Apache Cassandra (begun 2008-ish, top-level 2010)• Apache Accumulo ...

6

Page 7: Apache Accumulo Overview

7 ©2014 Cloudera, Inc. All rights reserved.

From Cloudbase to Accumulo

• Started in 2008 as National Security Agency project• Submitted to Apache Incubator in 2011 (and renamed)• Top-level project in 2012

7

Page 8: Apache Accumulo Overview

8

Storage Model

8

Page 9: Apache Accumulo Overview

9 ©2014 Cloudera, Inc. All rights reserved.

Key / Value Store

Accumulo stores tables of key / value pairs

9

Page 10: Apache Accumulo Overview

10

©2014 Cloudera, Inc. All rights reserved.

Key / Value Store

A row is a sorted sequence of key / value pairsEach pair is a cell

10

Page 11: Apache Accumulo Overview

11

©2014 Cloudera, Inc. All rights reserved.

The Key

11

row

column

timestamp

family qualifier visibility

Page 12: Apache Accumulo Overview

12

©2014 Cloudera, Inc. All rights reserved.

An example key

12

bhavanki

column1401041295

personal middle PII

Page 13: Apache Accumulo Overview

13

©2014 Cloudera, Inc. All rights reserved.

Another example key

13

brees

column1401041296

employment salary FIN

Page 14: Apache Accumulo Overview

14

©2014 Cloudera, Inc. All rights reserved.

It’s all bytes

All key and value data are stored as bytesexcept timestamp is a long

There are no built-in data typesbut lexicoders help with common types

Key components are usually UTF-8 strings

14

Page 15: Apache Accumulo Overview

15

©2014 Cloudera, Inc. All rights reserved.

Some rows for you

15

row cf cq cv ts value

bhavanki job employer 2013-09-01 Cloudera

bhavanki personal beer 2013-09-15 Omission

bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw

brees job employer 2013-10-01 White Cliffs

brees personal house NOMUGGL 2014-01-01 Hufflepuff

Page 16: Apache Accumulo Overview

16

©2014 Cloudera, Inc. All rights reserved.

Visibility Labels

Boolean expression

Specialist | (Management & SpecTraining)

Authorizations are provided in each scan

16

Page 17: Apache Accumulo Overview

17

©2014 Cloudera, Inc. All rights reserved.

Locality Groups

You can identify sets of one or more column families as locality groups

Data in a locality group is stored together for improved read performance

17

Page 18: Apache Accumulo Overview

18

©2014 Cloudera, Inc. All rights reserved.

Tablets

A table is comprised of one or more tablets

18

employeesemployees

employees;Semployees;Semployees;Hemployees;H employees;~employees;~

Page 19: Apache Accumulo Overview

19

©2014 Cloudera, Inc. All rights reserved.

Tablets

Tablets maps to data files in HDFS

19

employees;Semployees;Semployees;Hemployees;H employees;~employees;~

rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3

Page 20: Apache Accumulo Overview

20

©2014 Cloudera, Inc. All rights reserved.

Tablets

Data also kept in write-ahead logs and memtable

20

employees;Hemployees;H

rfile 1rfile 1

walogswalogs

memtablememtable

Page 21: Apache Accumulo Overview

21

Loading and Querying

21

Page 22: Apache Accumulo Overview

22

©2014 Cloudera, Inc. All rights reserved.

Java Client API

22

Page 23: Apache Accumulo Overview

23

©2014 Cloudera, Inc. All rights reserved.

Java Client API

Read using scanners

Scanner s = conn.createScanner(“employees”, new Authorizations());s.setRange(“alice”, “eve”);s.setColumnFamily(“personal”);for (Entry<Key, Value> e : s) employeeIds.add(e.getKey().getRow());

23

Page 24: Apache Accumulo Overview

24

©2014 Cloudera, Inc. All rights reserved.

Java Client API

Read access via iterator pattern• server-side system iterators handle timestamps,

authorization checks, and lots more• iterators almost always wrap other iterators, forming a chain• you can define your own, client-side or server-side

24

Page 25: Apache Accumulo Overview

25

©2014 Cloudera, Inc. All rights reserved.

Java Client API

Scanners fetch sorted rows from one rangeBatch scanners fetch unsorted rows from multiple

ranges in parallelIsolated scanners ensure that you do not see a row

mid-change

25

Page 26: Apache Accumulo Overview

26

©2014 Cloudera, Inc. All rights reserved.

MapReduce

AccumuloInputFormatAccumuloOutputFormat

26

Page 27: Apache Accumulo Overview

27

©2014 Cloudera, Inc. All rights reserved.

MapReduce

AccumuloRowInputFormatAccumuloRowOutputFormat

27

Page 28: Apache Accumulo Overview

28

©2014 Cloudera, Inc. All rights reserved.

Shell

Command-line / manual access to Accumulo data• scan, insert, delete• iterator management• table management (creation, deletion, cloning)• user and authorization management• table splitting and merging• ... more

28

Page 29: Apache Accumulo Overview

29

©2014 Cloudera, Inc. All rights reserved.

Bulk Import

Got lots of data to import quickly?• Use MR job to format data using AccumuloFileOutputFormat• Import files using shell

Trade off latency / availablity for throughput

29

Page 30: Apache Accumulo Overview

30

Daemons

30

Page 31: Apache Accumulo Overview

31

©2014 Cloudera, Inc. All rights reserved.

Tablet Server

Serves tablets (table data)• writes data to walog, memtable; deals with compaction• serves data for reads from files, memtable• handles recovery from walogs in case of server failure

Most client calls go to tablet servers

31

Page 32: Apache Accumulo Overview

32

©2014 Cloudera, Inc. All rights reserved.

Master

• assigns tablets to tablet servers• detects tablet server failures and reassigns tablets• balances tablet assignments over time• coordinates table operations

Multiple supported for failover, only one active

32

Page 33: Apache Accumulo Overview

33

©2014 Cloudera, Inc. All rights reserved.

Everybody Else in Accumulo

Garbage Collector (GC) - identifies and deletes files in HDFS that are no longer neededTracer - listens for and stores distributed trace messages using a special table

33

Page 34: Apache Accumulo Overview

34

©2014 Cloudera, Inc. All rights reserved.

Everybody Else in Accumulo

• Monitor - collects and serves status information• server status• log inspection• performance data• table inspection

34

Page 35: Apache Accumulo Overview

35

©2014 Cloudera, Inc. All rights reserved.

Everybody Else outside Accumulo

• HDFS (as part of Apache Hadoop)• stores tablet files• stores write-ahead logs (1.5+)

• MapReduce (Hadoop)• bulk import• batch processing

• Apache ZooKeeper

35

Page 36: Apache Accumulo Overview

36

Getting Starteda.k.a. the Pitch

36

Page 37: Apache Accumulo Overview

37

©2014 Cloudera, Inc. All rights reserved.

Easy as 1-2-3?

1.Install Hadoop (HDFS and MapReduce)2.Install ZooKeeper3.Install Accumulo!

37

Page 38: Apache Accumulo Overview

38

©2014 Cloudera, Inc. All rights reserved.

Making Steps 1 and 2 Easier

Use a complete, pre-packaged Hadoop distribution... like CDH!

a leading commercial distribution centered on Apache Hadoop

•many ecosystem components•configured / updated to work together

38

Page 39: Apache Accumulo Overview

39

©2014 Cloudera, Inc. All rights reserved.

Making Steps 1 and 2 Easier

Cloudera Manager•deployment•configuration•operation•security

39

Page 40: Apache Accumulo Overview

40

©2014 Cloudera, Inc. All rights reserved.

Making Step 3 Easier

Standard Apache Accumulo installation is via tarball• no longer shipping RPM / DEB / ...

Using CDH/CM you can use:• a tarball, RPM or DEB with Accumulo packaged for CDH • a parcel (like RPM / ZIP) for easier upgrades

• 1.4.4 and 1.4.5 available now• 1.6.0 soon

40

Page 41: Apache Accumulo Overview

41

©2014 Cloudera, Inc. All rights reserved.

Where to Go for More

• http://accumulo.apache.org/• http://www.cloudera.com/content/cloudera/en/products-and-s

ervices/cdh.html• http://www.cloudera.com/content/cloudera/en/products-and-s

ervices/cloudera-enterprise/cloudera-manager.html• http://www.cloudera.com/content/cloudera/en/products-and-

services/cdh/accumulo.html

41

Page 42: Apache Accumulo Overview

42

©2014 Cloudera, Inc. All rights reserved.

Accumulo Summit

Join us on June 12

42

Page 43: Apache Accumulo Overview

43

©2014 Cloudera, Inc. All rights reserved.

Quick Thanks

• My slide reviewers• Sean Busbey• Mike Drob

• Accumulo community• You all for listening

43

Page 44: Apache Accumulo Overview

44

©2014 Cloudera, Inc. All rights reserved.

Thank you!Bill [email protected]

44