Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Accumulo]

Preview:

Citation preview

Building Scalable

Aggregation Systems

Accumulo Summit

April 28th, 2015

Gadalia O’Bryan and Bill Slacum

gadaliaobryan@koverse.com

billslacum@koverse.com

Outline

• The Value of Aggregations

• Abstractions

• Systems

• Details

• Demo

• References/Additional Information

Aggregation provides a means of turning billions of pieces of raw data into condensed, human-consumable

information.

Aggregation of Aggregations

Time Series

Set Size/Cardinality

Top-K

Quantiles

Density/Heatmap

16.3k Unique

Users

G1

G2

Abstractions

1

2

3

4

10

+

+

+

=

Concept from (P1)

1

2

3

4

3

+ +

=

7

=

10

=

+

We can parallelize integer addition

Associative + Commutative

Operations

• Associative: 1 + (2 + 3) = (1 + 2) + 3

• Commutative: 1 + 2 = 2 + 1

• Allows us to parallelize our reduce (for

instance locally in combiners)

• Applies to many operations, not just

integer addition.

• Spoiler: Key to incremental aggregations

{a,

b}

{b, c}

{a, c}

{a}

{a, b,

c}

+ +

=

{a, c}

=

{a, b,

c}

=

+

We can also parallelize the “addition” of other types, like Sets, as

Set Union is associative

Monoid Interface

• Abstract Algebra provides a formal foundation for what we can casually observe.

• Don’t be thrown off by the name, just think of it as another trait/interface.

• Monoids provide a critical abstraction to treat aggregations of different types in the same way

Many Monoid Implementations

Already Exist

• https://github.com/twitter/algebird/

• Long, String, Set, Seq, Map, etc…

• HyperLogLog – Cardinality Estimates

• QTree – Quantile Estimates

• SpaceSaver/HeavyHitters – Approx Top-K

• Also easy to add your own with libraries

like stream-lib [C3]

Serialization• One additional trait we need our

“aggregatable” types to have is that we

can serialize/deserialize them.

1

2

3

4

3

+ +

=

7

=

1

0

=

+

1) zero()

2) plus()

3) plus()

4) serialize()

6) deserialize()

5) zero()

7) plus()

9) plus()

3

78) deserialize()

These abstractions enable a

small library of reusable code to

aggregate data in many parts of

your system.

Systems

SQL on Hadoop

• Impala, Hive, SparkSQL

milliseconds seconds minutes

large many few

seconds minutes hours

billions millions thousands

Query Latency

# of Users

Freshness

Data Size

Online Incremental Systems

• Twitter’s Summingbird [PA1, C4], Google’s Mesa [PA2],

Koverse’s Aggregation Framework

milliseconds seconds minutes

large many few

seconds minutes hours

billions millions thousands

Query Latency

# of Users

Freshness

Data Size

SM

K

Online Incremental Systems:

Common Components

• Aggregations are computed/reduced

incrementally via associative operations

• Results are mostly pre-computed for so

queries are inexpensive

• Aggregations, keyed by dimensions, are

stored in low latency, scalable key-value

store

Summingbird Program

Summingbird

Data

HDFS

Queues Storm

Topology

Hadoop

Job

Online

KV store

Batch

KV store

Client

LibraryClient

Reduce

Reduce

Reduce

Reduce

Mesa

Data (batches)

Colossus

Query

Server

61

62

91

92

Singletons

61-70

Cumulatives

61-80

61-90

0-60

Base

Compaction

Worker

Reduce

Reduce

Client

Koverse

Data

Apache Accumulo

Koverse

Server

Hadoop JobReduce

Reduce

ClientRecords Aggregates

Min/Maj

Compation

IteratorReduce

Scan

Iterator

Reduce

Details

Ingest (1/2)

• We bulk import RFiles over writing via a BatchWriter

• Failure case is simpler as we can retry whole batch in case an aggregation job fails or a bulk import fails

• BatchWriters can be used, but code needs to be written handle Mutations that are uncommitted and there’s no roll back for successful commits

Ingest (2/2)

• As a consequence of importing (usually

small) RFiles, we will be compacting more

• In testing (20 nodes, 200+ jobs/day), we

have not had to tweak compaction

thresholds nor strategies

• Can possibly be attributed to relatively

small amounts of data being held at any

given time due to reduction

Accumulo Iterator

• Combiner Iterator:

A SortedKeyValueIterator that combines the

Values for different versions (timestamp) of a

Key within a row into a single Value. Combiner

will replace one or more versions of a Key and

their Values with the most recent Key and a

Value which is the result of the reduce method.

Our Combiner

• We can re-use Accumulo's Combiner type here:

override def reduce:(key: Key, values: Iterator[Value]) Value = {

val sum = agg.reduceAll(values.map(v => agg deserialize v))

return (key, sum)}

• Our function has to be commutative because major compactions will often pick smaller files to combine, which means we only see discrete subsets of data in an iterator invocation.

Accumulo Table Structure

row colf colq visibility timestamp value

field1Name\x1Ffiel

d1Value\x1Ffield2

Name\x1Ffield2Val

ue...

Aggregation

Type

relation visibility timestamp Serialized

aggregation

results

Example: origin\x1FBWI count: [U] 6074Example: origin\x1FBWI topk:destination [U] {“DIA”: 1}Example: origin\x1FBWI\x1Fdate\x1F20150427 count: [U] 104

Visibilities (1/2)

• Easy to store, bit tougher to query

• Data can be stored at separate visibilities

• Combiner logic has no concept of visibility,

it only loops over a given

PartialKey.ROW_COLFAM_COLQUAL

• We know how to combine values (Longs,

CountMinSketchs), but how do we

combine visibilities?

Visibilities (2/2)

• Say we have some data on Facebook photo albums:– facebook\x1falbum_size count: [public] 800

– facebook\x1falbum_size count: [private] 100

• Combined value would be 900

• But, what should we return for the visibility of public + private? We need more context to properly interpret this value.

• Alternatively, we can just drop it

Queries

• This schema is geared towards point

queries.

• Order of fields matters.

• GOOD “What are the top-k destinations

from BWI?”

• NOT GOOD“What are all the dimensions

and aggregations I have for BWI?”

Demo

References

Presentations

P1. Algebra for Analytics - https://speakerdeck.com/johnynek/algebra-for-analytics

Code

C1. Algebird - https://github.com/twitter/algebird

C2. Simmer - https://github.com/avibryant/simmer

C3. stream-lib https://github.com/addthis/stream-lib

C4. Summingbird - https://github.com/twitter/summingbird

Papers

PA1. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations http://www.vldb.org/pvldb/vol7/p1441-boykin.pdf

PA2. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42851.pdf

PA3. Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms http://arxiv.org/abs/1304.7544

Video

V1. Intro To Summingbird - https://engineering.twitter.com/university/videos/introduction-to-summingbird

Graphics

G1. Histogram Graphic - http://www.statmethods.net/graphs/density.html

G2. Heatmap Graphic - https://www.mapbox.com/blog/twitter-map-every-tweet/

G3. The Matrix Background - http://wall.alphacoders.com/by_sub_category.php?id=198802

Backup Slides

Monoid Examples

Monoid Examples

Aggregation Flow

RowId: hour:2014_08_24_09| client:WebCF: Count CQ:Value: 3

RowId: client:AndroidCF: Count CQ:Value: 1

RowId: client:AndroidCF: Count CQ:Value: 5

RowId: client:iPhoneCF: Count CQ:Value: 6

kv_records kv_aggregates

New Records from Import Jobs client: iPhonetimestamp: 1408935773...

client: Androidtimestamp: 1408935871...

client: Webtimestamp: 1408935792...

Periodic, Incremental MapReduce Jobs

(like the current Stats Job) read Records

and emit Aggregate KVs based on the

Aggregate configuration for the Collection

Aggregate( onKey( “client”, “hour”, “client”) produce( Count) prepare( (“timestamp”, “hour”, BinByHour()))

Aggregate Configuration is a type-safe,

Scala object. Code is sent to the server

as a String, where it is compiled (not

executed). The serialized object is

passed to the MR job to generate KVs

from Records. Contains the dimensions

(onKeys), aggregation operation

(produce), and optional projections

(prepare) which can be built-in functions

or custom Scala closures. We envision

an UI building these objects in the future.

Map

Combine

Emit KVs.Key = dimension + operationValue = Serialized Monoid Aggregator

Aggregation Reduction

Reduce

Aggregation Reduction

RFiles

RowId: client:iPhoneCF: Count CQ:Value: 3

RowId: client:AndroidCF: Count CQ:Value: 5

RowId: hour:2014_08_24_09| client:AndroidCF: Count CQ:Value: 2

MinC

MajC

Aggregation Reduction

Aggregation Reduction

UserQuery

Scan

Iterator

Aggregation Reduction

{ key: “client:iPhone”, produce: Count }

{ key: “client:iPhone”, produce: Count, value: 9 }

Aggregation Reduction is the same common code in 5 places. For

Aggregates with the same Key, the Values are reduced based on the

operation (Sum, Set, Cardinality Est., etc). The Values are always

serialized objects that implement the MonoidAggregator interface.

Adding a new aggregation operation will impact a single class only -

no new Iterators or MR code.

RowId: hour:2014_08_24_09| client:WebCF: Count CQ:Value: 8

Aggregate Queries are simple point

queries for a single KV. If the user wants

something like an enumeration of “client”

values, they will use a Set or Top-K

operation and the single value will contain

the answer with no range scans required.

The API may support batching multiple

keys per request to efficiently support

queries to build timeseries (e.g., counts

for each hour in the day)