View
149
Download
1
Category
Preview:
Citation preview
HDF Powered by Apache NiFi Intro
Milind PanditSolutions Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda HDF 2.0: Flow Management– NiFi basics– NiFi use cases– NiFi demos
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Enterprise Data Flow
Data Flow
Process and Analyze DataAcquire Data
Store Data
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with different business partners and customers
Realistic View of Enterprise Data Flow
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Secu
rity
Visu
aliza
tion
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow (HDF)
Constrained High-latency Localized context
Hybrid – cloud/on-premises Low-latency Global context
SOURCES REGIONAL INFRASTRUCTURE
CORE INFRASTRUCTURE
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
• For agile and immediate creation, configuration, control of dataflowsVisual Command and Control
• Ensures trust of your dataData Lineage (Provenance)
• Because not all data is of equal importanceData Prioritization
• Since not all senders/receivers/connections work perfectly all the timeData Buffering/Back-Pressure
• Adapt to different situations with different requirementsControl Latency vs Throughput
• Security of data, and data accessSecure Control Plane/Data Plane
• ScalabilityScale out Clustering
• Ecosystem flexibility and growthExtensibility
Apache NiFi: Designed for 8 challenges of global enterprise dataflow
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi: Three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and data plane
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance
• Recovery/recording a rolling log of fine-grained history
• Visual command and control• Flow templates• Pluggable/multi-role security• Designed for extension• Clustering
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common Apache NiFi Use Cases
Predictive AnalyticsEnsure the highest value data is captured and available for analysisComplianceGain full transparency into provenance and flow of data
IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge
Fraud DetectionMove sales transaction data in real time to analyze on demand
Big Data IngestEasily and efficiently ingest data into Hadoop
Value ResourcesGain visibility into how data sources are used to determine value
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi used for?• Reliable and secure transfer of data between systems• Delivery of data from sources to analytic platforms• Enrichment and preparation of data:
– Conversion between formats– Extraction/Parsing– Routing decisions
What is Apache NiFi NOT used for?• Distributed Computation• Complex Event Processing• Joins / Complex Rolling Window Operations
Use Cases for Apache NiFi
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FlowFile• Unit of data moving through the system• Content + Attributes (key/value pairs)
Processor• Performs the work, can access FlowFiles
Connection• Links between processors• Queues that can be dynamically prioritized
Terminology
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HTTP Data FlowFile
HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTContent-Type: text/html
Hello world XXXXXXXXXXXXXXXXXXXXXXXXXXXX
Key: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'Key: 'filename’ Value: '15650246997242'Key: 'path’ Value: './’
0101010101110101010101010101 (Binary)
Header
Content
Analogy: FlowFiles are like HTTP Data
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
1. Drag and drop processors to build a flow2. Start, stop, and configure components in real time3. View errors and corresponding error messages4. View statistics and health of data flow5. Create templates of common processor & connections
Create, Run, View, Start, Stop, Change, Fix, Dataflows in Real-Time
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi Demo: Tail Logs, Route on Content, Buffer in Kafka, Deliver to HDFS
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Data Provenance and Why is it Important?
BEGIN
ENDLINEAGE
IT and Cloud Operators• Understand traceability, lineage• Enable recovery and replay
Compliance Regulations• Provide an audit trail• Remediation capabilities
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Provenance Enables Easy Access and Traceability of Changes
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Need Fine-Grained Security and Compliance?
Security• Secured authentication• Enterprise authorization services –
entitlements change often• Encrypted content, encrypted
communications• People and systems with different roles
require difference access levels• Tagged/classified data
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories - Pass by reference
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Repositories – Copy on Write
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
Guaranteed delivery Data buffering
‒ Backpressure‒ Pressure release
Prioritized queuing Flow specific QoS
‒ Latency vs. throughput‒ Loss tolerance
Data provenance
Recovery / recording a rolling log of fine-grained history
Designed for extension
Different from Apache NiFi Design and Deploy Warm re-deploys
Key Features
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Agent
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example: Company X provides alerting services when users’ resting heart rate higher than a threshold
Real-Time Insights Require DataFlow Mgmt and Stream Processing
Acquire Data
Company X Cloud Instance 1
Acquire Data
Company X Cloud Instance 2
Acquire Data
Company X Cloud Instance 3
Acquire Data Across Cloud
Instances
Parse, Filter, Validate, Enrich
and Route
Core Data Center
Analytics/Pattern Match
Data Store
Alerts
Dashboards/Visualization
Flow Management Stream ProcessingLegend:
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion Needs Dataflow Management and Stream Processing
Acquire data from various Wearable Device’s Cloud Instances
Move Data from Customer Cloud Instances to on-premise instance
Perform Intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.
Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.
Parse the device data to standardized format that downstream sysem can understand
Enrich the data with contextual information including patient/customer info (age, sex, etc..)
Recognize the Pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.
Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate.
Flow Management (NiFi, MiNiFi)
StreamProcessing
(Storm, Kafka)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data in Motion(Cloud)
Data in Motion
(on-premises)
Data at Rest
(on-premises)
Edge Data
Data in Motion
Edge Analytics
Data at Rest
(Cloud)
Edge Data
Data at Rest
(on-premises)
Closed Loop Analytics
MachineLearning
Deep HistoricalAnalysis
The Future of DataArchitectural Transformation Enabled By Connected Data Platforms
On PremCloud
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Cases for Data in Motion
Use Cases for Data-in-Motion Using DataFlow Mgmt• Data Ingestion • Edge Intelligence• First Mile Problem • Physical Data Movement • Simple event processing such as Route, Filter, Enrich,
Transform, etc.
When Only DataFlow Management is
Required
Use Cases for Data-in-Motion Using DataFlow Mgmt and Steam Processing• Flow Management to deliver data for Stream Processing• PLUS: Complex pattern matching on unbounded streams of
data.
When Both DataFlow Management and Stream Processing
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow management
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWSAzure
Google CloudHadoop
NiFiKafka
Storm
Others…NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0: Data-in-Motion Platform
Enterprise Services
Ambari Ranger Other services
Flow management + Stream Processing
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New Stream Processing Features HDF 2.0
New Storm Connectors Storm-Kafka Spout using new
client APIs Storm Distributed Log Search Storm Dynamic Worker
Profiling Kafka Grafana Integration Storm Grafana Integration
Improved Nimbus HA Storm Automatic Back
Pressure Storm Distributed cache Storm Windowing and State
Management Storm Performance
improvements Improved Kafka SASL
Storm Topology Event inspector Storm Resource Aware
Scheduling Storm Dynamic Log Levels Pacemaker Storm Daemon Kafka Rack Awareness
Developer Productivity Enterprise Readiness Operational Simplicity
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Information, Resources
Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com
Partnerworks: http://hortonworks.com/partners/
HDF Certification: http://hortonworks.com/partners/product-integration-certification/
Webinars: http://hortonworks.com/events-webcasts/
Sandbox: http://hortonworks.com/events-webcasts/
HDF: http://hortonworks.com/hdf/
HDP: http://hortonworks.com/hdp/
Recommended