Upload
hadoop-summit
View
601
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Frameworks and technologies in the Hadoop ecosystem are undergoing rapid innovation, but the open source tooling around usability has lagged behind. We will present a suite of tools, deployable on top of the Hadoop ecosystem, that enables even non-technical users to develop, tune, and maintain efficient Pig workflows and easily interact with and visualize datasets. Netflix?s big data teams have worked for the past year implementing this framework in the AWS cloud. During that time, we have seen a massive influx of data and a corresponding increase in new development on our platform. This toolset has been a critical enabler in minimizing development time and effort. Using the development of a recommendation algorithm as an example, we?ll walk through use cases for this stack of tools, showing how they interact to facilitate development. The presentation will include demos, implementation details, and our roadmap to open source various key services in the framework, including restful services that: provide comprehensive metadata management across data sources; enable visualization and caching of results of Hadoop jobs; visualize the execution plans produced by languages such as Pig and Hive; and provide detailed analytics on the currently executing workload and trends in historical performance.
Citation preview
Watching Pigs Fly with the Netflix Hadoop Toolkit
Hadoop Summit 2013San Jose, CA
Data should be accessible, easy to discover, and easy to process for everyone.
Our Motivation
Our Users
Analysts Engineers
Hadoop Platform as a Service
Hadoop Platform as a Service
S3
Hadoop Platform as a ServiceData Platform
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Forklift (Data Movement)
Looper(Backloading)
Ignite(A/B Test Analytics)
Spock(Data Auditing)
Genie(Hadoop PaaS)
Lipstick(Pig Workflow Visualization)
Event Service(Orchestration)
Hadoop
S3
Other Processing
Let’s solve a problem using the data!
Build a recommender.
But, what makes good recommendations?Similarity
Personalization
COLORS!
COLORS!Box art is colorful…
We’re Sorry
COLORS!Box art is colorful…
Where can I find the data?
Hadoop Platform as a Service
S3
Hadoop Platform as a Service
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin(Metadata API)
S3Cassandra TeradataRedshiftRDS
Data Platform as a Service
Franklin(Metadata API)
Create a dataset for box art and color.
Whether your dataset is large or small, being able to visualize it makes it easier to explain.
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Sting
• Allows users to cache the results of a genie job in memory
• Sub second response to OLAP style operations (slicing, dicing, aggregations).
• Adhoc / recurring schedule• Easy to use!
HiveQuery
Schema
% Content Consumed / Hour
HemlockGrove
House ofCards
ArrestedDevelopment
Similarity
House ofCards Macbeth
Toddlers& Tiaras
Star Trek:Voyager
Personalization
# of subscribers X # of titles = ???,000,…,000 (big data)
Big Data
Netflix Apache Pig
Lipstick
Data Platform as a Service
Franklin(Metadata API)
Sting(Adhoc Visualization)
Lipstick
• Allows users to visualize their data flow• Allows users to see common errors• Allows users to easily monitor their jobs• Empowers users to support themselves• Facilitates communication between
infrastructure team and users
Lipstick
Overall JobProgress
LogicalPlan
Overall JobProgress
Logical Operator(reduce side)
Logical Operator(map side)
Map/Reduce Job
Intermediate Row Count
RecordsLoaded
HadoopCounters
My Job has stalled.
Common Problem #1
Unoptimized/OptimizedLogical Plan Toggle
Dangling Operator
I didn’t get the data I was expecting
Common Problem #2
I don’t understand why my job failed.
Common Problem #3
Failed Job(light red background)
Successful Job(light blue background)
Wrapping up
• Demos at the Netflix booth in the exhibit hall (see more Lipstick, Sting, and Genie).
• Lipstick is part of Netflix OSS.• Clone it on github at http:
//github.com/Netflix/Lipstick• We welcome feedback and contributions!
Charles Smith: [email protected] Jeff Magnusson: [email protected]
Thank you!
Jobs: http://jobs.netflix.comNetflix OSS: http://netflix.github.io
Tech Blog: http://techblog.netflix.com/