18
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Mission to NARs with Apache NiFi Aldrin Piri - @aldrinpiri ApacheCon Big Data 2016 12 May 2016

ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

Embed Size (px)

Citation preview

Page 1: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Mission to NARs with Apache NiFiAldrin Piri - @aldrinpiriApacheCon Big Data 201612 May 2016

Page 3: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Start with a dataflow… but we can do better!

• Do better with the NiFi Framework and custom processor• Extension Points: Processors, Controller Services, Reporting Tasks• Process Session & Process Context• How the API ties to the NiFi repositories

• Testing isn’t that bad!

• Share with templates!

Page 4: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Adding new functionality and development approach

Extending the platform is about leveraging expansive Java ecosystem and existing code

– Make use of open source projects and provided libraries for targeted systems and services

– Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework

Test framework provides powerful means of testing extensions in isolation as they would work in a live instance

Deployment is as simple as copying the created NAR to your instance(s) lib directory

Page 5: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Minimal Dependencies Needed

Java Development Kit, version 1.7 or later Maven, version 3.1.0+

Page 6: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Boilerplate Code is provided via Maven Archetype

Support for creating bundles of major extension points of Processors and Controller Services– Processor Bundle

– Controller Service Bundle

Page 7: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What is a NAR?

– Bundles the developed code to provide extensions and their dependencies

– Allows extension classloader isolation, aiding in versioning issues that can be pervasive in interacting with a wide variety of systems, services, and formats

NAR == NiFi ARchive

Consider it to be an OSGi-lite package

NAR Bundle Structure

Page 8: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

How long does it take to create an extension?

Incorporating functionality from an existing library– Create a bundle– Include a dependency to the library– Design User Experience

• Properties – How can this extension be configured? What are valid values for user input?• Relationships – How will data move to the next stage of its processing?

– Wrap the core classes of the library in the framework and implement onTrigger• ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions• ProcessContext allows accessing defined properties which the framework has validated

– Test– Deploy

For the majority of cases, development time is measured in hours*

Page 9: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

How long does it really take to create an extension?

Increased development effort may be needed for handling specific protocols

– Driven through manual management of sessions, when there are resources with their own lifecycles beyond the sole onTrigger method

– Common for protocol “Listeners”

For the majority of cases, development time is still measured in hours

Page 10: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Behind the Scenes

Page 11: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Architecture

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

Page 12: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories - Pass by reference

FlowFile Content Provenance

F1 C1 C1 P1 F1

BEFORE

AFTER

F2 C1 C1 P3 F2 – Clone (F1)

F1 C1 P2 F1 – Route

P1 F1 – Create

Page 13: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories – Copy on Write

FlowFile Content Provenance

F1 C1 C1 P1 F1 - CREATE

BEFORE

AFTER

F1 C1

F1.1 C2C2 (encrypted)

C1 (plaintext)

P2 F1.1 - MODIFY

P1 F1 - CREATE

Page 14: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Quick (and dirty?) Prototyping

Page 15: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Prototype Dataflows Using Existing Binaries/Applications

ExecuteProcess – Acts as a source processor, creating FlowFiles containing data written to STDOUT by the target application

ExecuteStreamCommand – Provides content of FlowFiles to an external application via STDIN and creates FlowFiles containing data written STDOUT

Processors allow making external calls to applications and programs outside of the JVM

Page 16: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Increased Flexibility of Prototyping via Scripting Languages

ExecuteScript– Acts as a source processor, creating FlowFiles containing data from a referenced Script

InvokeScriptedProcessor – Provides access to the core framework API for interacting with NiFi like a native Java processor

Processors allow using JVM friendly interpreted languages

Page 18: ApacheCon Big Data 2016: Mission to NARs with Apache NiFi

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thanks for hanging out!