Upload
felicia-haggarty
View
56
Download
0
Tags:
Embed Size (px)
Citation preview
@nmotgi
Nitin Motgi
Whither the Hadoop Developer Experience ?
PROPRIETARY & CONFIDENTIAL2
• Introduction to data applications
• Challenges with building operational data applications on Hadoop
• Motivation and Goals for CDAP
• Use-‐cases
• Introduction to CDAP and Architecture Overview
• Demo
Agenda
PROPRIETARY & CONFIDENTIAL3
Applications that use data insights to enhance the customers/user experience, achieve a business objective or improve a business process.
What are Data Applications?
PROPRIETARY & CONFIDENTIAL4
• 360-‐Degree Customer View
• Recommendation Engine
• Predictive Modeling
• Fraud Analysis
• Network Threat Detection
• Telemetry Analysis
• Time Series Analysis
• Data Processing -‐ ETL
• And many more
Examples
Challenges
Technology Explosion
Core HadoopHDFS, MR
2006
HbaseZooKeeper
Core Hadoop
2008
HivePig
MahoutHbase
ZooKeeperCore Hadoop
2009
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2010
FlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2011
SparkImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
2012
SentryTez
ParquetYARNSparkYARNImpala
SolrKafkaFlumeBigtopOozie
MRUnitHCatalog
SqoopWhirrAvroHivePig
MahoutHbase
ZookeeperCore Hadoop
Knox
Present
APPLICATION
COMPLEXITY
MANY DOMAINS TO
BRIDGE
LOTS OF
BOILERPLATEINCONSISTENT
APIS
NO
REUSABILITY LACK OF DEVELOPER
PRODUCTIVITY
Challenges
Application Complexity
Mo:va:on
Motivation• Simple yet powerful platform for developers to build applications on Hadoop
• Expose capabilities rather than features
•Make Hadoop accessible to developers with no Hadoop knowledge
Goals• Unified platform for building solutions on Hadoop
• Simpler application development lifecycle
• Reusable Data and Processing Patterns with Abstractions
• Framework level correctness and consistency
• Easy to use developer APIs
PROPRIETARY & CONFIDENTIAL12
• Reliable and scalable real-‐time business critical analytics
• Closed Loop Recommendation and Analytics
• Data Ingestion As A Service
• Extendable and Reusable use-‐case blueprints
• ETL Automation -‐ Real-‐time and Batch
• Data As A Service
• Reduce development and operational complexity of Hadoop
Typical Customer Use-cases
Which one of these are applicable to you ?
Introduc:on toCask Data Applica:on PlaCorm
An open source, integrated, distributed and extensible platform for building data applications on Hadoop.
Cask Data Application Platform
Provides
Supports developers, operations, and organizations through the entire enterprise data application lifecycle.
CASK DATA APP PLATFORM
Data Lifecycle
Ingest
Explore
Transform
Serve
Application Lifecycle
Develop
Test
Deploy
Scale
EnterpriseLifecycle
Secure
Manage
Monitor
Operate
Supports
17
ServeTransformExploreIngest
Unification
ACID
Dataset
Streams
Realtime - Tigon
JDBC
Query
RPC
SparkMR Dataset
Dataset
MR
Spark
Ad-hocquery
Dataset API, SPI & Management Services
Application Structure
Building Blocks
Dataset Program
Encapsulated data access paEerns and data model in a reusable, domain-‐specific API
Standardized containers for processing paradigms
ProgramaUc abstracUon for composing mulUple Datasets and Programs that integrates ingesUon, exploraUon, transformaUon and serving
Application
Dataset ProgramProgramDataset
19
Deployment Architecture
• Services• Master• Router • Auth Server
CDAP Server• Highly Available (HA)• Installed on edge node(s)• Supports Kerberos - Impersonation & Permitter Security• Manager system services in YARN
CDAP Server
System Services (Twill Containers)• Transactions (Tephra)• Metrics Aggregation• Log Aggregation• Dataset Services• Metadata Management Service• Explore Service• Stream Management Service & more
Want to Learn More?
Open-source (Apache License v2)
Website: http://cdap.io
Mailing List: [email protected] [email protected]
IRC: #cdap on freenode.net
QUESTIONS?