21
“DevOps for Big Data” โดย คุณศุภเกศ วงศ์คำภู Solution Architect at Enersys.co.th สัมมนา Big Data & Analytics โดย ดาต้า คิวบ์ (facebook.com/datacube.th)

“DevOps for Big Data” - Data Mining Trenddataminingtrend.com/.../uploads/2015/04/devops_for_bigdata.pdf · DevOps for Big Data by @Supaket 4 April 2015 DevOps Culture Tools Process

Embed Size (px)

Citation preview

“DevOps for Big Data”โดย คุณศุภเกศ วงศ์คำภู

Solution Architect at Enersys.co.th

สัมมนา Big Data & Analytics โดย ดาต้า คิวบ์ (facebook.com/datacube.th)

DevOps for Big Data by @Supaket 4 April 2015

DevOps for Big Data

• FICO (Thailand) (Past)

• DST (Thailand) (Past)

• Thomson Reuter (Thailand) (Past)

• Meta Genesis Development (Past)

@Supakethttp://facebook.com/supaket

https://www.linkedin.com/in/supaket

Software Every Thing @ Enersys

DevOps for Big Data by @Supaket 4 April 2015

Software Engineering practice

http://newrelic.com/devops/lifecycle

Time to market

Dev

Ops

deploy often

deploy faster

build faster

test in production like

reduce time to test

increase coverage

virtualization dev & test

DevOps for Big Data by @Supaket 4 April 2015

What is DevOps? - In Simple English

http://www.youtube.com/watch?v=_I94-tJlovg

DevOps for Big Data by @Supaket 4 April 2015

DevOps

DevOps

(a portmanteau of "development" and "operations")

is a concept dealing with, among other things: software development, operations, and services. It emphasises communication, collaboration, and integration between software developers and information technology (IT) operations personnel.

en.wikipedia.org/wiki/DevOps

DevOps for Big Data by @Supaket 4 April 2015

DevOps

Culture

ProcessTools

Mind Set of Culture, Process and Tools adoption to make software more quality, faster develop/test/

release, for speed up time to market

supaket

DevOps for Big Data by @Supaket 4 April 2015

2014 State of DevOps report

Strong IT performance is a competitive advantage. Firms with high-performing IT organisations were twice as likely to exceed their profitability, market share and

productivity goals

DevOps for Big Data by @Supaket 4 April 2015

2014 State of DevOps report

DevOps practices improve IT performance. IT performance strongly correlates with well-known DevOps practices such as use of version control and continuous

delivery. The longer an organization has implemented — and continues to improve upon — DevOps practices, the better it performs. And better IT performance

correlates to higher performance for the entire organization.

DevOps for Big Data by @Supaket 4 April 2015

2014 State of DevOps report

Organizational culture matters. Organizational culture is one of the strongest predictors of both IT performance and overall performance of the organization. High-trust organizations encourage good information flow, cross-functional collaboration, shared responsibilities, learning from failures and new ideas; they are also the most likely to perform at a high level. These cultural practices and norms found in high-

trust organizations are also at the heart of DevOps, which helps explain why DevOps practices correlate so strongly with high organizational performance.

DevOps for Big Data by @Supaket 4 April 2015

2014 State of DevOps report

Job satisfaction is the No. 1 predictor of organisational performance. We all know how job satisfaction feels: It’s about doing work that’s challenging and meaningful,

and being empowered to exercise our skills and judgment. We also know that where there’s job satisfaction, employees bring the best of themselves to work: their engagement, their creativity and their strongest thinking. That makes for more

innovation in any area of the business, including IT.

DevOps for Big Data by @Supaket 4 April 2015

Production vs Development environment

What the problem?

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://newrelic.com/devops/lifecycle

It works on my machine

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://newrelic.com/devops/lifecycle

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://newrelic.com/devops/lifecycle

Reproducible

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://newrelic.com/devops/lifecycle

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://newrelic.com/devops/lifecycle

20 Guys join team, How to Start develop in 1st Day?

DevOps for Big Data by @Supaket 4 April 2015

Common Problems

http://www.blue-agility.com/important-lesson-getting-code-production/

Production Like environment

DevOps for Big Data by @Supaket 4 April 2015

Introduction to Virtualization

http://newrelic.com/devops/lifecycle

Production Environment

Developer Machine

Production Like environment

DevOps for Big Data by @Supaket 4 April 2015

What ’s about virtualization ?

Hypervisor Container

DevOps for Big Data by @Supaket 4 April 2015

What is Vagrant & Docker ?

DevOps for Big Data by @Supaket 4 April 2015

What is Vagrant?

• A VM management tool • Automate the setup of your environment ( Dev & QA )

Vagrant is a tool for building complete development environments. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases development/production parity, and makes the "works on my machine" excuse a relic of the past.

Vagratup.com

DevOps for Big Data by @Supaket 4 April 2015

Vagrant.

http://newrelic.com/devops/lifecycle

Vagrant Command

- init - up - halt - reload - pause - resume - destroy - package

DevOps for Big Data by @Supaket 4 April 2015

Vagrant - Big Picture

DevOps for Big Data by @Supaket 4 April 2015

Vagrant - Network Mode

DevOps for Big Data by @Supaket 4 April 2015

Vagrant for Developer Machine

New Joiner• Someone joins your project…

• They pick up their laptop…

• Then spend the next 1-2 days following instructions on setting up their environment, tools, etc.

DevOps for Big Data by @Supaket 4 April 2015

What is Docker?

Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight

runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly

assembled from components and eliminates the friction between development, QA, and production environments. As a result, IT can ship faster and run the same app,

unchanged, on laptops, data center VMs, and any cloud.

Solomon Hykes, Docker’s Founder & CTO, gives an overview of Docker in this short video (7:16).

DevOps for Big Data by @Supaket 4 April 2015

What is Docker?

DevOps for Big Data by @Supaket 4 April 2015

Docker for shipping an immune environment

Apache'SparkAn'introduction'to'Spark'and'Spark'streaming

DevOps for Big Data by @Supaket 4 April 2015

What'is'Apache'Spark?

• Cluster'computing'engine'designed'to'be'fast'and'general:purpose'• Good'for'Processing'data'streaming'• Good'for'Machine'learning'task'• Unified'platform

DevOps for Big Data by @Supaket 4 April 2015

Spark'Components

DevOps for Big Data by @Supaket 4 April 2015

Spark'Core

• Basic'functionality'of'Spark,'including'components'for'task'scheduling,'memory'management,'fault'recovery,'interacting'with'storage'systems,'and'more'

• Provide(API(for(Resilient(distributed(datasets'(RDDs)

DevOps for Big Data by @Supaket 4 April 2015

Concept':'Resilient'distributed'datasets'(RDDs)

• Immutable'Collections'of'objects'spread'across'a'cluster'

• Built'through'parallel'transformations'(map,'filter,'etc.)'

• Controllable'persistence'(e.g.'caching'in'RAM)'

• Automatically'rebuilt'on'failure'

• Contain'any'type'of'Python,'Java,'or'Scala'objects,'including'user:defined'classes.

Key'Idea:'Write'programs'in'terms'of'transformations'on'

distributed'datasets

DevOps for Big Data by @Supaket 4 April 2015

Spark'Streaming'(1)

• Spark'component'that'enables'processing'of'live%streams'of'data''i.e.'production'log'file,'queue,''• Provide'an'API'for'manipulate'data'stream'(DStream)''• Fault'tolerance,'throughput,'and'scalability'as'Spark'Core.'• Spark’s'built:in'machine'learning'algorithms'and'graph'processing'algorithms'can'be'applied'to'data'streams

DevOps for Big Data by @Supaket 4 April 2015

Spark'Streaming'(2)

• Chop'up'the'live'stream'into'batches'of'X'seconds'• Spark'treats'each'batch'of'data'as'RDDs'''''and'processes'them'using'RDD'operations'• Finally,'the'processed'results'of'''''the'RDD'operations'are'returned'in'batches

DevOps for Big Data by @Supaket 4 April 2015

DevOps for Big Data by @Supaket 4 April 2015

Log anomaly detection in production

production environment

Apache'Spark

Vagrant

APACHE'LOG'ReaderJsonMesage

PredictionModel

Input'Reader

FileOutPut

Result Output

DSTREAM

RDD

YARN

4 April 2015

DevOps for Big Data by @Supaket 4 April 2015

Log anomaly detection in Development

developer machine

Apache'Spark

Docker

Vagrant

Docker

APACHE'LOG'ReaderJsonMesage

PredictionModel

Input'Reader

FileOutPut

Result Output

DSTREAM

RDD

YARN

4 April 2015

DevOps for Big Data by @Supaket 4 April 2015

Show case

Running Demo

DevOps for Big Data by @Supaket 4 April 2015

Q&A

Thank you

DevOps for Big Data by @Supaket 4 April 2015

http://www.devopsdays.in.th http://www.devopsdays.org http://devopscafe.org http://vimeo.com/devopsdays http://newrelic.com/devops/lifecycle http://www.slideshare.net/search/slideshow?searchfrom=header&q=devops

Reference