Upload
phungthu
View
216
Download
3
Embed Size (px)
Citation preview
“DevOps for Big Data”โดย คุณศุภเกศ วงศ์คำภู
Solution Architect at Enersys.co.th
สัมมนา Big Data & Analytics โดย ดาต้า คิวบ์ (facebook.com/datacube.th)
DevOps for Big Data by @Supaket 4 April 2015
DevOps for Big Data
• FICO (Thailand) (Past)
• DST (Thailand) (Past)
• Thomson Reuter (Thailand) (Past)
• Meta Genesis Development (Past)
@Supakethttp://facebook.com/supaket
https://www.linkedin.com/in/supaket
Software Every Thing @ Enersys
DevOps for Big Data by @Supaket 4 April 2015
Software Engineering practice
http://newrelic.com/devops/lifecycle
Time to market
Dev
Ops
deploy often
deploy faster
build faster
test in production like
reduce time to test
increase coverage
virtualization dev & test
DevOps for Big Data by @Supaket 4 April 2015
What is DevOps? - In Simple English
http://www.youtube.com/watch?v=_I94-tJlovg
DevOps for Big Data by @Supaket 4 April 2015
DevOps
DevOps
(a portmanteau of "development" and "operations")
is a concept dealing with, among other things: software development, operations, and services. It emphasises communication, collaboration, and integration between software developers and information technology (IT) operations personnel.
en.wikipedia.org/wiki/DevOps
DevOps for Big Data by @Supaket 4 April 2015
DevOps
Culture
ProcessTools
Mind Set of Culture, Process and Tools adoption to make software more quality, faster develop/test/
release, for speed up time to market
supaket
DevOps for Big Data by @Supaket 4 April 2015
2014 State of DevOps report
Strong IT performance is a competitive advantage. Firms with high-performing IT organisations were twice as likely to exceed their profitability, market share and
productivity goals
DevOps for Big Data by @Supaket 4 April 2015
2014 State of DevOps report
DevOps practices improve IT performance. IT performance strongly correlates with well-known DevOps practices such as use of version control and continuous
delivery. The longer an organization has implemented — and continues to improve upon — DevOps practices, the better it performs. And better IT performance
correlates to higher performance for the entire organization.
DevOps for Big Data by @Supaket 4 April 2015
2014 State of DevOps report
Organizational culture matters. Organizational culture is one of the strongest predictors of both IT performance and overall performance of the organization. High-trust organizations encourage good information flow, cross-functional collaboration, shared responsibilities, learning from failures and new ideas; they are also the most likely to perform at a high level. These cultural practices and norms found in high-
trust organizations are also at the heart of DevOps, which helps explain why DevOps practices correlate so strongly with high organizational performance.
DevOps for Big Data by @Supaket 4 April 2015
2014 State of DevOps report
Job satisfaction is the No. 1 predictor of organisational performance. We all know how job satisfaction feels: It’s about doing work that’s challenging and meaningful,
and being empowered to exercise our skills and judgment. We also know that where there’s job satisfaction, employees bring the best of themselves to work: their engagement, their creativity and their strongest thinking. That makes for more
innovation in any area of the business, including IT.
DevOps for Big Data by @Supaket 4 April 2015
Production vs Development environment
What the problem?
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
It works on my machine
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
Reproducible
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://newrelic.com/devops/lifecycle
20 Guys join team, How to Start develop in 1st Day?
DevOps for Big Data by @Supaket 4 April 2015
Common Problems
http://www.blue-agility.com/important-lesson-getting-code-production/
Production Like environment
DevOps for Big Data by @Supaket 4 April 2015
Introduction to Virtualization
http://newrelic.com/devops/lifecycle
Production Environment
Developer Machine
Production Like environment
DevOps for Big Data by @Supaket 4 April 2015
What ’s about virtualization ?
Hypervisor Container
DevOps for Big Data by @Supaket 4 April 2015
What is Vagrant & Docker ?
DevOps for Big Data by @Supaket 4 April 2015
What is Vagrant?
• A VM management tool • Automate the setup of your environment ( Dev & QA )
Vagrant is a tool for building complete development environments. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases development/production parity, and makes the "works on my machine" excuse a relic of the past.
Vagratup.com
DevOps for Big Data by @Supaket 4 April 2015
Vagrant.
http://newrelic.com/devops/lifecycle
Vagrant Command
- init - up - halt - reload - pause - resume - destroy - package
DevOps for Big Data by @Supaket 4 April 2015
Vagrant - Big Picture
DevOps for Big Data by @Supaket 4 April 2015
Vagrant - Network Mode
DevOps for Big Data by @Supaket 4 April 2015
Vagrant for Developer Machine
New Joiner• Someone joins your project…
• They pick up their laptop…
• Then spend the next 1-2 days following instructions on setting up their environment, tools, etc.
DevOps for Big Data by @Supaket 4 April 2015
What is Docker?
Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight
runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly
assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app,
unchanged, on laptops, data center VMs, and any cloud.
Solomon Hykes, Docker’s Founder & CTO, gives an overview of Docker in this short video (7:16).
DevOps for Big Data by @Supaket 4 April 2015
What is Docker?
DevOps for Big Data by @Supaket 4 April 2015
Docker for shipping an immune environment
Apache'SparkAn'introduction'to'Spark'and'Spark'streaming
DevOps for Big Data by @Supaket 4 April 2015
What'is'Apache'Spark?
• Cluster'computing'engine'designed'to'be'fast'and'general:purpose'• Good'for'Processing'data'streaming'• Good'for'Machine'learning'task'• Unified'platform
DevOps for Big Data by @Supaket 4 April 2015
Spark'Components
DevOps for Big Data by @Supaket 4 April 2015
Spark'Core
• Basic'functionality'of'Spark,'including'components'for'task'scheduling,'memory'management,'fault'recovery,'interacting'with'storage'systems,'and'more'
• Provide(API(for(Resilient(distributed(datasets'(RDDs)
DevOps for Big Data by @Supaket 4 April 2015
Concept':'Resilient'distributed'datasets'(RDDs)
• Immutable'Collections'of'objects'spread'across'a'cluster'
• Built'through'parallel'transformations'(map,'filter,'etc.)'
• Controllable'persistence'(e.g.'caching'in'RAM)'
• Automatically'rebuilt'on'failure'
• Contain'any'type'of'Python,'Java,'or'Scala'objects,'including'user:defined'classes.
Key'Idea:'Write'programs'in'terms'of'transformations'on'
distributed'datasets
DevOps for Big Data by @Supaket 4 April 2015
Spark'Streaming'(1)
• Spark'component'that'enables'processing'of'live%streams'of'data''i.e.'production'log'file,'queue,''• Provide'an'API'for'manipulate'data'stream'(DStream)''• Fault'tolerance,'throughput,'and'scalability'as'Spark'Core.'• Spark’s'built:in'machine'learning'algorithms'and'graph'processing'algorithms'can'be'applied'to'data'streams
DevOps for Big Data by @Supaket 4 April 2015
Spark'Streaming'(2)
• Chop'up'the'live'stream'into'batches'of'X'seconds'• Spark'treats'each'batch'of'data'as'RDDs'''''and'processes'them'using'RDD'operations'• Finally,'the'processed'results'of'''''the'RDD'operations'are'returned'in'batches
DevOps for Big Data by @Supaket 4 April 2015
DevOps for Big Data by @Supaket 4 April 2015
Log anomaly detection in production
production environment
Apache'Spark
Vagrant
APACHE'LOG'ReaderJsonMesage
PredictionModel
Input'Reader
FileOutPut
Result Output
DSTREAM
RDD
YARN
4 April 2015
DevOps for Big Data by @Supaket 4 April 2015
Log anomaly detection in Development
developer machine
Apache'Spark
Docker
Vagrant
Docker
APACHE'LOG'ReaderJsonMesage
PredictionModel
Input'Reader
FileOutPut
Result Output
DSTREAM
RDD
YARN
4 April 2015
DevOps for Big Data by @Supaket 4 April 2015
Show case
Running Demo
DevOps for Big Data by @Supaket 4 April 2015
Q&A
Thank you