39
Introduction to Apache NiFi & Storm Jungtaek Lim

Introduction to Apache NiFi And Storm

Embed Size (px)

Citation preview

Page 1: Introduction to Apache NiFi And Storm

Introduction to Apache NiFi & Storm

Jungtaek Lim

Page 2: Introduction to Apache NiFi And Storm

WHO AM I?• Staff Software Engineer @ Hortonworks

• remote worker

• Open source prosumer

• Committer of Jedis

• PMC member of Apache Storm

• Contributor of Apache (Spark, Zeppelin, Ambari, Calcite), Redis, and so on.

• Contact

[email protected]

• Twitter / LinkedIn / Github / Facebook

• @heartsavior

Page 3: Introduction to Apache NiFi And Storm

CoreInfrastructureSources

à ConstrainedÃHigh-latencyà Localizedcontext

ÃHybrid– cloud/on-premisesà Low-latencyÃGlobalcontext

RegionalInfrastructure

DATA IN MOTION IN HORTONWORKS DATAFLOW (HDF)

Source: http://ko.hortonworks.com/products/data-center/hdf/

Page 4: Introduction to Apache NiFi And Storm

What is Apache NiFi?

Page 5: Introduction to Apache NiFi And Storm

An easy to use, powerful, and reliable system to process and distribute data.

Page 6: Introduction to Apache NiFi And Storm

History of Apache NiFi

Page 7: Introduction to Apache NiFi And Storm

• Created by the United States National Security Agency (NSA)

• originally named Niagarafiles

• In 2014 the NSA submitted the source code to Apache Software Foundation, via the NSA Technology Transfer Program, entered incubation in December 2014

• Development of Apache NiFi continued at Onyara, Inc., a start up company

• Became Apache Top-Level Project in July 2015

• Hortonworks acquired Onyara, Inc. in August 2015

Page 8: Introduction to Apache NiFi And Storm

Role of Apache NiFi

Page 9: Introduction to Apache NiFi And Storm

• Data acquisition and delivery

• Simple transformation and data routing

• Simple event processing

• End to end provenance

• Edge intelligence and bi-directional comms.

Page 10: Introduction to Apache NiFi And Storm

NOT intended to REPLACE ‘distribute computation engines’

(a.k.a streaming processing frameworks)

Page 11: Introduction to Apache NiFi And Storm

Features of Apache NiFi

Page 12: Introduction to Apache NiFi And Storm

Highly configurable

• Loss tolerant vs guaranteed delivery

• Low latency vs high throughput

• Dynamic prioritization

• Flow can be modified at runtime

• Back pressure

Page 13: Introduction to Apache NiFi And Storm

More…• Designed for extension

• Build your own processors and more

• Secure

• SSL, SSH, HTTPS, encrypted content, etc...

• Multi-tenant authorization and internal authorization/policy management

• MiNiFi subproject

• Reduce footprint to ~ 40 MB

Page 14: Introduction to Apache NiFi And Storm

What is Apache Storm?

Page 15: Introduction to Apache NiFi And Storm

A free and open source distributed realtime computation system.

Page 16: Introduction to Apache NiFi And Storm

History of Apache Storm

Page 17: Introduction to Apache NiFi And Storm

Source: http://hortonworks.com/blog/brief-history-apache-storm/

Page 18: Introduction to Apache NiFi And Storm

Concepts of Apache Storm

Page 19: Introduction to Apache NiFi And Storm

• Spout: a source of streams in a topology

• Bolt: a processing component which includes Sink

• Stream: an unbounded sequence of tuples, defined with schema

• Stream groupings: defines how that stream should be partitioned among the bolt's tasks

• Topology: the logic for a realtime application represented to a DAG

Page 20: Introduction to Apache NiFi And Storm

Core vs Trident

Page 21: Introduction to Apache NiFi And Storm

Core Trident

Computation Unit Record (tuple) Micro batch

Latency Very low (sub-seconds) High (up to batch size)Similar to Spark Streaming

Delivery Guarantee At least once Exactly once

API Compositional Declarative

Stateful Operator Supported from v1.0.0 Core feature(exactly-once)

Windowing Time (processing time, event time), CountTumbling window, Sliding window

Page 22: Introduction to Apache NiFi And Storm

Features of Apache Storm

Page 23: Introduction to Apache NiFi And Storm

• Supports number of connectors (17 connectors in master branch)

• Automatic back-pressure

• Distributed Cache

• Flux (constructing topology via yaml)

• Distributed Log Search

• Dynamic Worker Profiling

• Dynamic Log Levels

• Topology Event Inspector

• Resource Aware Scheduler

• SQL (Experimental)

Page 24: Introduction to Apache NiFi And Storm

Future of Apache StormApache Storm 2.0 and beyond

Page 25: Introduction to Apache NiFi And Storm

• Clojure to Java translation

• Unified Stream API with supporting exactly-once

• Rework Metrics feature

• Apache Beam runner

• Streaming SQL with Apache Calcite

• And more…

• Performance

• Usability

Page 26: Introduction to Apache NiFi And Storm

THANKS!Any questions?

Page 27: Introduction to Apache NiFi And Storm

Appendix A. Apache NiFi

Page 28: Introduction to Apache NiFi And Storm
Page 29: Introduction to Apache NiFi And Storm

NiFi EvaluateJsonPath / RouteOnAttribute configuration

Page 30: Introduction to Apache NiFi And Storm

NiFi PutHDFS / PublishKafka configuration

Page 31: Introduction to Apache NiFi And Storm

NiFi Queue options – Status History

Page 32: Introduction to Apache NiFi And Storm

NiFi Queue options – List queue

Page 33: Introduction to Apache NiFi And Storm

NiFi Data Provenance

Page 34: Introduction to Apache NiFi And Storm

Appendix B. Apache Storm

Page 35: Introduction to Apache NiFi And Storm

Distributed Log Search

Page 36: Introduction to Apache NiFi And Storm

Dynamic Worker Profiling

Page 37: Introduction to Apache NiFi And Storm

Dynamic Log Levels

Page 38: Introduction to Apache NiFi And Storm

Topology Event Inspector

Page 39: Introduction to Apache NiFi And Storm

Resource Aware SchedulerSource:ResourceAwareSchedulinginApacheStorm,HadoopSummitSanJose2016