24
Accumulo Summit - 4/28/2015 Event-Driven Big Data with Accumulo Leveraging Big Data in Motion… John Hebeler Lockheed Martin Inc. [email protected] “It is a capital mistake to theorize before one has data.” Sherlock Holmes

Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Embed Size (px)

Citation preview

Page 1: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Accumulo Summit - 4/28/2015

Event-Driven Big Data with Accumulo

Leveraging Big Data in M o t i o n …

John HebelerLockheed Martin Inc.

[email protected]

“It is a capital mistake to theorize before one has data.” Sherlock Holmes

Page 2: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Plan…✴Brief Event-Driven Overview✴Accumulo Event Management✴Demonstration/Access to EC2

2

Page 3: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Events❖ Events drive our world - it is our context

❖ Data processing often reflects these events but with batch latency, poor resolution, longitudinal conflicts, and pull-type architectures

❖ If you don’t ask - no one hears…

❖ Event consequences are delayed and possibly lost

❖ Especially true “In Context” with related events

❖ Time plays a critical factor - before, after, simultaneous…

❖ Focus on Accumulo Role and Implementation

3

Page 4: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Event-Driven Architecture

❖ Events drive to consequences

❖ Multiple Levels/Iterations

❖ Clients (or downstream events) analyze the consequences in near real-time

❖ Stateless except for Big Data (Accumulo) which makes it possible!

❖ Resolution, Fidelity, Query, …4

Page 5: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Accumulo Data Model

❖ Decomposable, Flexible Key

❖ Lexicographical Index (only) from Row ID

❖ Family and Qualifier can be “Columns” or Row/Key “Enrichment”

❖ Visibility controls row level flexible “security”

❖ Timestamp usually automatic and allows “versions”

❖ Value

❖ Anything but not really “searchable”

❖ Any above can be quite h u g e❖ Atomic only at Row Level

KeyValue

Row IDColumn

TimestampFamily Qualifier Visibility

Page 6: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Events and Context❖ Store events for easy retrieval

❖ Events continue to grow; Context reaches steady state

❖ Proper interpretation of an event within its context

❖ Idempotence

6

Page 7: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Categories

1. Direct Accumulo Operations

2. Event Programming

3. Event Management with Accumulo

Page 8: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Direct Accumulo Operations

Page 9: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Query❖ Key constructs - Packed fields vs Column based - your choice

❖ Lexigraphical Index Only Index - (Another word for build a new table)

❖ a finds a.a.a.b

❖ Not usually practical to search in the Value

❖ Query for the past values (versions)

❖ Time

ArrayList<Range> ranges = new ArrayList<Range>( );// Populate rangesBatchScanner bs = conn.createBatchScanner(table,… );b.setRanges(ranges)

TableOperations to = conn.tableOperations()to.setProperty(tableName, “table.iterator.scan.vers.opt.maxVersions”, N);to.setProperty(tableName, “table.iterator.majc.vers.opt.maxVersions”, N);to.setProperty(tableName, “table.iterator.minc.vers.opt.maxVersions”, N);

RowID Family Qualifier Value

9

Page 10: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Event Update❖ Store events for easy retrieval

❖ Maintain context surrounding the event

❖ Write with same key - updates valueRowID Family Qualifier Value

10

EventID1 EventID2 EventID3 Event** JSON or Serialized Object

Page 11: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Event Cursor❖ Accumulo Cursor automatically buffers responses to conserve memory

❖ Events constructed directly from an Accumulo row do not

❖ If not careful, out of memory exceptions (especially true in big data)

RowID Family Qualifier ValueClass EventCursor {Iterator rowIterator = null;public EventCursor(Scanner s) {

rowIterator = s.iterator();}

public Event next() { return( row2Event(s.iterator.next())); } }

Page 12: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

A Word About Accumulo Visibility…

❖ Different

❖ (part of the key)

Page 13: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Event Programming

Page 14: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Exception based Programming❖ Don’t ask for permission but plan for exceptions…

❖ Faster and more efficient

❖ Program to expect that they won’t happen and if they do, handle it

❖ Watch out for thread contention - can use LockRowID Family Qualifier Value

// Optional - openLock.lock();while(true){ try { wr = aClient.createBatchWriter(EVENT_CONTEXT_TABLE, new BatchWriterConfig()); break; } catch (TableNotFoundException e) {

// Create Table and retry - also need to catch TableExistsException aClient.tableOperations().create(EVENT_CONTEXT_TABLE);

}}// Optional - openLock.unlock();

Page 15: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Avoid Transactions❖ Big data transactions expensive (and difficult)

❖ Make the need rare and solution lazy

❖ Distributed partial state dilemma

Append and update a single row does not require formal transactions

Race Condition lazy recognition and repair

Accumulo only ensures row level transactions (but can still be of value for each field can hold a lot of data)

Event conclusions too close in time are just reprocessed or properly thread bundled

RowID Family Qualifier Value

15

Page 16: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Progressive Provenience❖ Retrieve origin of event combinations

❖ Maintain context surrounding the event

❖ Use same key in different tables for rapid traversalRowID Family Qualifier Value

16

Page 17: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Test Events

❖ Test Flag allows In-Stream Test and Validation

❖ Availability

❖ Performance

❖ Quality

❖ What Ifs

❖ Flag indicates different storage table, queues, …

Page 18: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Event Management with Accumulo

Page 19: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Turning an Event Off❖ Event assertion no longer supported (but was)

RowID Family Qualifier Value

19

Page 20: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Forgetting an Event (Error)❖ Store events for easy retrieval

❖ Maintain context surrounding the event

RowID Family Qualifier Value

20

Page 21: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Time Travel❖ Rerun (Time) Events due to corrupted data,

out-of-order events, event error, event correction, or “what if”scenarios

❖ Develop context surrounding the event

❖ Remixing the cake

** Need to Run Topic X again since last October due to error then

// Collect all events for Topic since October (already in time order)

// Clear Topic X Context

// Rerun collected events in order (all corrected now!)

RowID Family Qualifier Value

21

Page 22: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Future Events

❖ Future Events (Expiring State, Travel Plans, …)

❖ May not happen or change…

RowID Family Qualifier Value

❖ Store event as always

❖ Schedule timer (or interval timer) to ignite future events

❖ Events easily removed due to update, timer finds nothing

❖ Requires careful consideration of index/RowId

Page 23: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Extra Extra❖ Analytics

❖ Events create a rich foundation for longitudinal analytics - but must consider the data model for efficient queries (proper indexing)

❖ Backup/Recovery

❖ Take advantage of Accumulo clone and pause processing

❖ Hybrid Systems

❖ Semantic Web

❖ Related NoSQL - MongoDB and Neo4J

❖ Map Reduce

❖ Gotcha

❖ Accumulo built upon Hadoop, Zookeeper…

Page 24: Accumulo Summit 2015: Event-Driven Big Data with Accumulo - Leveraging Big Data in Motion [Leveraging Accumulo]

Follow Up❖ Email for EC2 accumulo and event driven prototype

[email protected]

❖ Questions any time

❖ Play - free micro computer one year