21
Introducing Apache Nifi Yifeng Jiang Solutions Engineer, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Nifi workshop

Embed Size (px)

Citation preview

Page 1: Nifi workshop

Introducing Apache Nifi

Yifeng Jiang Solutions Engineer, Hortonworks

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 2: Nifi workshop

About Me

Yifeng Jiang•  Solutions Engineer, Hortonworks•  Apache HBase book author•  I like hiking•  Twitter: @uprush

Page 3: Nifi workshop

Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Agenda

•  Introduction to Nifi •  Nifi Demo •  Nifi Use Case

Page 4: Nifi workshop

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Introduction to Apache NiFi

Page 5: Nifi workshop

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Nifi Overview

Nifi is an easy to use, powerful, and reliable system to process and distribute data.

Page 6: Nifi workshop

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Terminology FlowFile

•  Unit of data moving through the system •  Content + Attributes (key/value pairs)

Processor •  Performs the work, can access FlowFiles

Connection •  Links between processors •  Queues that can be dynamically prioritized

Process Group •  Set of processors and their connections •  Receive data via input ports, send data via output ports

Page 7: Nifi workshop

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - User Interface

•  Drag and drop processors to build a flow •  Start, stop, and configure components in real time •  View errors and corresponding error messages •  View statistics and health of data flow •  Create templates of common processor & connections

Page 8: Nifi workshop

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Provenance

•  Tracks data at each point as it flows through the system

•  Records, indexes, and makes events available for display

•  Handles fan-in/fan-out, i.e. merging and splitting data

•  View attributes and content at given points in time

Page 9: Nifi workshop

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Queue Prioritization

•  Configure a prioritizer per connection

•  Determine what is important for your data – time based, arrival order, importance of a data set

•  Funnel many connections down to a single connection to prioritize across data sets

•  Develop your own prioritizer if needed

Page 10: Nifi workshop

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi - Architecture

Page 11: Nifi workshop

Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Nifi Cluster

•  Nifi Cluster Manager •  Nifi Cluster Nodes

•  Primary Node

•  Isolated Processor OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

OS/Host

JVM

NiFi Cluster Manager – Request Replicator

Web Server

Master NiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFile Repository

Content Repository

Provenance Repository

Local Storage

Slaves NiFi Nodes

Page 12: Nifi workshop

Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Demo

Page 13: Nifi workshop

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Nifi Demo •  The demo cluster: ambari deployment, NCM

•  Real-time indexing in Solr & Banana

•  Nifi UI •  Flow statistics •  Data provenance, event details, replay

•  Add a Processor to push data to Kafka

•  Nifi data on the node •  Flow file repository •  Content repository

•  Provenance repository

Page 14: Nifi workshop

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site to Site -- Flow

Nifi Cluster A (source)

Nifi Cluster B (destination)

Site to site

Remote Process Group

Flow file attributes transferred

Page 15: Nifi workshop

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Site to Site – Data Provenance

Nifi Cluster A (source)

Nifi Cluster B (destination)

Event details at cluster B

Page 16: Nifi workshop

Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

NiFi Use Cases

Page 17: Nifi workshop

Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Use Cases – Index JSON 1.  Pull in Tweets using Twitter API

2.  Extract language and text into FlowFile attributes

3.  Get non-empty English tweets ${twitter.text:isEmpty():not():and(

${twitter.lang:equals("en")})}

4.  Merge together JSON documents based on quantity, or time

5.  Use dynamic field mappings to select fields for indexing:

Page 18: Nifi workshop

Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Use Cases – Index a Relational Database 1.  GenerateFlowFile acts a timer to trigger

ExecuteSQL (Future plans to not require in an incoming FlowFile to ExecuteSQL NIFI-932)

2.  ExecuteSQL performs a SQL query and streams the results as an Avro datafile Use expression language to construct a dynamic date range:

${now():toNumber():minus(60000)

:format(‘YYYY-MM-DD’}

3.  Convert Avro to JSON using built in ConvertAvroToJSON processor

4.  Stream JSON update to Solr

Page 19: Nifi workshop

Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Built-in Processors

•  90 built-in processors •  Well-defined API

•  Easy to implement

Page 20: Nifi workshop

Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Page 21: Nifi workshop

Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Thank You