Upload
sebastien-goasguen
View
798
Download
0
Embed Size (px)
DESCRIPTION
A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this clearly. Amazon AWS however has already deployed analytics services on the their cloud while open source IaaS solutions are still struggling to deliver a EC2 clone. Cloud and Big data has three common points: 1-use an EC2 clone and a S3 clone (riakCS, glusterfs etc) to build a cloud 2-Use a big data solutions as a backend to your cloud to provide EBS or large scale image catalogue 3-deploy big data solutions on your cloud with tools like apache whirr, pallet, and newer devops tool chains with vagrant and co.
Citation preview
Sebastien Goasguen, January 29th
@sebgoa
Cloud and Big Data
Drag picture to placeholder or click icon to add
A view on Big Data
http://www.economist.com/node/15557443?story_id=15557443
SKA
Drag picture to placeholder or click icon to add
How did we get there ?
A natural evolution
New Distributed systems for:
Large scale datasets• From scientific instruments• From Web apps logs
Complex datasets• Not necessarily large.
Object stores• S3 clones
BigData and map-reduce
• While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing.
• BigData ≠ Map-Reduce ≠ HDFS• Map-reduce is a way to express
embarrassingly parallel work easily.• You can do Map-Reduce without HDFS.
• e.g Basho map-reduce on riackCS
Drag picture to placeholder or click icon to add
A really quick view on Clouds
Open Source IaaS
Today
BigData at peak
History
2003 –Google File System2005 – Hadoop2006 – Hadoop enters ASF incubator (Feb)2006 – S3 launched 2007 – Paper on Amazon Dynamo2009 – EMR launched2013 – CloudStack as a ASF TLP (March)2013 – Spark/Mesos enters ASF incubator
Drag picture to placeholder or click icon to add
The Apache Software Foundation
Apache Software Foundation
35 projects in incubation:• 12 Hadoop related• ~30% Big Data related• Spark
117 top level projects:• ~16 cloud or bigdata +10%• Deltacloud, Libcloud, Whirr, jclouds• Hadoop, couchdb, cassandra, mesos• Bigtop, accumulo, lucene, UIMA• CloudStack
Hadoop Ecosystem
+ Up-coming next generation BD systems
Drag picture to placeholder or click icon to add
Big Data and Cloud (Stack)s
Clouds and BigData
• Object store + compute IaaS to build EC2+S3 clone
• BigData solutions as storage backends for image catalogue and large scale instance storage.
• BigData solutions as workloads to CloudStack based clouds.
EC2, S3 clone• An open source IaaS with an EC2
wrapper e.g Opennebula• Deploy a S3 compatible object store –
separately- e.g riakCS• Two independent distributed systems
deployed
Cloud = EC2 + S3
Big Data as IaaS backend
“Big Data” solutions can be used as secondary storage .
Example• Open source IaaS + EC2 wrapper, e.g
CloudStack• Deploy S3 compatible object store, e.g
riakCS or Ceph or glusterFS• Use S3 as image store• Your EC2 service is a customer to your S3
service• Logstash + elasticsearch for logs/monitoring
Even use Bare Metal
Drag picture to placeholder or click icon to add
Big Data as a Workload to the Cloud
Mesos, Spark are EC2 native
oec2_deploy.pyoec2_deploy.sho…
Tools
“PaaS”
Dev Pipeline
Conclusions
• Big Data is “catching up”• Tackle the big three head on:
• BigData, Cloud and DevOps• Add a big data backend to your cloud
from the start • Provide Big Data services on your cloud
Still behind !
Final Thoughts
Who manages my data transfers ?
Event
ApacheCON + CloudStack Collaboration Conference
Denver April 7-11th.
Cloud and Big Data
Get Involved with Apache CloudStack
Web: http://cloudstack.apache.org/
Mailing Lists: cloudstack.apache.org/mailing-lists.html
IRC: irc.freenode.net: 6667 #cloudstack #cloudstack-dev
Twitter: @cloudstack
LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859
If it didn’t happen on the mailing list, it didn’t happen.