38
Sebastien Goasguen, January 29 th @sebgoa Cloud and Big Data

Cloud and Big Data trends

Embed Size (px)

DESCRIPTION

A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this clearly. Amazon AWS however has already deployed analytics services on the their cloud while open source IaaS solutions are still struggling to deliver a EC2 clone. Cloud and Big data has three common points: 1-use an EC2 clone and a S3 clone (riakCS, glusterfs etc) to build a cloud 2-Use a big data solutions as a backend to your cloud to provide EBS or large scale image catalogue 3-deploy big data solutions on your cloud with tools like apache whirr, pallet, and newer devops tool chains with vagrant and co.

Citation preview

Page 1: Cloud and Big Data trends

Sebastien Goasguen, January 29th

@sebgoa

Cloud and Big Data

Page 2: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

A view on Big Data

Page 3: Cloud and Big Data trends

http://www.economist.com/node/15557443?story_id=15557443

Page 4: Cloud and Big Data trends

SKA

Page 5: Cloud and Big Data trends
Page 6: Cloud and Big Data trends
Page 7: Cloud and Big Data trends
Page 8: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

How did we get there ?

Page 9: Cloud and Big Data trends

A natural evolution

Page 10: Cloud and Big Data trends

New Distributed systems for:

Large scale datasets• From scientific instruments• From Web apps logs

Complex datasets• Not necessarily large.

Object stores• S3 clones

Page 11: Cloud and Big Data trends

BigData and map-reduce

• While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing.

• BigData ≠ Map-Reduce ≠ HDFS• Map-reduce is a way to express

embarrassingly parallel work easily.• You can do Map-Reduce without HDFS.

• e.g Basho map-reduce on riackCS

Page 12: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

A really quick view on Clouds

Page 13: Cloud and Big Data trends
Page 14: Cloud and Big Data trends
Page 15: Cloud and Big Data trends

Open Source IaaS

Page 16: Cloud and Big Data trends

Today

Page 17: Cloud and Big Data trends

BigData at peak

Page 18: Cloud and Big Data trends

History

2003 –Google File System2005 – Hadoop2006 – Hadoop enters ASF incubator (Feb)2006 – S3 launched 2007 – Paper on Amazon Dynamo2009 – EMR launched2013 – CloudStack as a ASF TLP (March)2013 – Spark/Mesos enters ASF incubator

Page 19: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

The Apache Software Foundation

Page 20: Cloud and Big Data trends

Apache Software Foundation

Page 21: Cloud and Big Data trends

35 projects in incubation:• 12 Hadoop related• ~30% Big Data related• Spark

117 top level projects:• ~16 cloud or bigdata +10%• Deltacloud, Libcloud, Whirr, jclouds• Hadoop, couchdb, cassandra, mesos• Bigtop, accumulo, lucene, UIMA• CloudStack

Page 22: Cloud and Big Data trends

Hadoop Ecosystem

+ Up-coming next generation BD systems

Page 23: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

Big Data and Cloud (Stack)s

Page 24: Cloud and Big Data trends

Clouds and BigData

• Object store + compute IaaS to build EC2+S3 clone

• BigData solutions as storage backends for image catalogue and large scale instance storage.

• BigData solutions as workloads to CloudStack based clouds.

Page 25: Cloud and Big Data trends

EC2, S3 clone• An open source IaaS with an EC2

wrapper e.g Opennebula• Deploy a S3 compatible object store –

separately- e.g riakCS• Two independent distributed systems

deployed

Cloud = EC2 + S3

Page 26: Cloud and Big Data trends

Big Data as IaaS backend

“Big Data” solutions can be used as secondary storage .

Page 27: Cloud and Big Data trends

Example• Open source IaaS + EC2 wrapper, e.g

CloudStack• Deploy S3 compatible object store, e.g

riakCS or Ceph or glusterFS• Use S3 as image store• Your EC2 service is a customer to your S3

service• Logstash + elasticsearch for logs/monitoring

Page 28: Cloud and Big Data trends

Even use Bare Metal

Page 29: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

Big Data as a Workload to the Cloud

Page 30: Cloud and Big Data trends

Mesos, Spark are EC2 native

oec2_deploy.pyoec2_deploy.sho…

Page 31: Cloud and Big Data trends

Tools

Page 32: Cloud and Big Data trends

“PaaS”

Page 33: Cloud and Big Data trends

Dev Pipeline

Page 34: Cloud and Big Data trends

Conclusions

• Big Data is “catching up”• Tackle the big three head on:

• BigData, Cloud and DevOps• Add a big data backend to your cloud

from the start • Provide Big Data services on your cloud

Page 35: Cloud and Big Data trends

Still behind !

Page 36: Cloud and Big Data trends

Final Thoughts

Who manages my data transfers ?

Page 37: Cloud and Big Data trends

Event

ApacheCON + CloudStack Collaboration Conference

Denver April 7-11th.

Cloud and Big Data

Page 38: Cloud and Big Data trends

Get Involved with Apache CloudStack

Web: http://cloudstack.apache.org/

Mailing Lists: cloudstack.apache.org/mailing-lists.html

IRC:  irc.freenode.net: 6667 #cloudstack #cloudstack-dev

Twitter:  @cloudstack

LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859

If it didn’t happen on the mailing list, it didn’t happen.