Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
SuperVessel: Enabling Spark as a Service with OpenStack and Docker
Guan Cheng (G.C.) Chen IBM Research - China weibo.com/parallellabs
SuperVessel Cloud
4/18/15 IBM Research -‐ China 2 �
• Public cloud built on the POWER7/POWER8 servers with OpenStack • It provides free access for students, researchers, developers across the world, and
helps grow OpenPOWER ecosystem (used in 30+ universiNes now) • It provides advanced technology services such as Spark as a Service, Docker Services,
CogniNve CompuNng Service, IoT Service, Accelerator as a Service (FPGA and GPU)
Spark as a Service
4/18/15 IBM Research -‐ China 3 �
Step 1: Login Step 2: Create Step 3: Ready!
3 steps to launch a Spark cluster, easy!
Try it at: www.ptopenlab.com
Why OpenStack?
4/18/15 IBM Research -‐ China 4 �
• Most popular IaaS soXware • Supports Docker* • Heat can orchestra docker containers easily – good for provision a Spark
cluster
Picture source: h-p://www.qyjohn.net/?p=3801
Why Docker? • Less resource consumpNon than
KVM – We can provision more Spark
clusters! • Boot faster than KVM
– Users like fast provision! • Incrementally build, revert and
reuse your container – We love Git and AUFS!
• However, Docker is not officially supported in OpenStack yet – Nova Docker is an external
component of OpenStack • Port Docker to POWER
Architecture (ppc64 and ppc64le) – Ubuntu 15.04 includes Docker for
POWER8 ppc64le
4/18/15 IBM Research -‐ China 5 �
Picture source: h-p://www.zdnet.com/ar@cle/what-‐is-‐docker-‐and-‐why-‐is-‐it-‐so-‐darn-‐popular/
Why Spark?
• Fast • Unified • Ecosystem PorNng to POWER
– Bugfix submieed to the community
– Spark 1.3 works smoothly! J
4/18/15 IBM Research -‐ China 6 �
Why not Sahara? Sahara is a component for Hadoop/Spark as a Service in OpenStack
• We started from OpenStack Icehouse … • DockerizaNon
– Beeer service deployment and isolaNon for Big Data dashboard server
• CustomizaNon – WaiNng for Sahara’s improvements is somehow *SLOW*
• Docker, user analyNcs, Spark 1.4, Spark IDE, scheduling, data visualizaNon etc.
4/18/15 IBM Research -‐ China 7 �
4/18/15 IBM Research -‐ China 8 �
Architecture Design
Big Data Dashboard
Keystone
Glance
Neutron
Heat Nova Nova
Docker
Cinder/Manila
Spark Cluster
Docker Image
Spark Master
Spark Worker
Spark Worker
Container 1
Container 2
Container 3
NameNode Spark Driver
DataNode
DataNode
Billing&Auth
1
2
2 3
Dockerize everything!
• We use containers to – run applica<ons – run other containers – run OpenStack python daemons – run OpenStack services inside OpenStack (with a special trunk link in neutron/OVS)
– run OpenStack inside OpenStack • Good for mulN-‐site expansion
4/18/15 IBM Research -‐ China 9 �
Heat template design Heat is a component for orchestration in OpenStack
• Parameters – Cinder/Malina/Neutron uuid – Size of Cinder/Malina resources
• Resources – Master/slave node – Neutron – Cinder/Manila
• Need to modify nova-‐docker to mount the cinder/manila resources when booNng the docker container
4/18/15 IBM Research -‐ China 10 �
Spark Docker Image
• Built from Ubuntu 14.04.1 • All Spark nodes use the same image, with different iniNalizaNon scripts by using cloudinit
• IniNalizaNon scripts will – Sync /etc/hosts across all nodes – Set HDFS and Spark configuraNons accordingly – Format HDFS and launch HDFS – Launch Spark
4/18/15 IBM Research -‐ China 11 �
Big Data Dashboard Development
• 2 Developers (frond end + backend) – Online in 2 months – Separates Heat related stuff and Dashboard – Reskul API (for billing and authenNcaNon etc) – Dockerize the big data dashboard server – Separates development and producNon environment
4/18/15 IBM Research -‐ China 12 �
Where should I put the data? Shared File System for Cloud and Spark as a Service
4/18/15 IBM Research -‐ China 13 �
Docker
(Symphony)
Horizon
OpenStack controller HEAT
Neutron Glance Manila
Nova
Cloud � Infrastructure � Service � Big � Data � Service �
• Select Big data compuNng framework (Mapreduce, SPARK
• Select cluster size • Select data folder size
HEAT template for big data cluster
Docker
(Symphony)
Docker
(Symphony)
Docker
(Symphony)
Docker
(Symphony)
Docker
(SPARK)
POWER7/POWER8
KVM/Docker
(Web app)
Folder A
User B User A
Folder B
User A
• HEAT will orchestrate docker instances, subnet and data folder based on user’s request • Manila provides the NFS service using GPFS as backend, and the folder will be mounted via nova-‐docker (with -‐v support) • Folder created by Manila could be accessed by the KVM/docker instances created for big data and other purpose
GPFS FPO
POWER7/POWER8
Servers
GPFS FPO Servers
GPFS FPO
KeyStone
Cinder
SuperVessel Services Roadmap
4/18/15 IBM Research -‐ China 14 �
SuperVessel Cloud Infrastructure
SuperVessel Cloud Service
SuperVessel Big Data and HPC
Service
Super Class Service
OpenPOWER Enablement Service
Super Project Team Service
Super Marketplace
1. VM and container service
2. Storage service 3. Network service 4. Accelerator as
service 5. Image service
1. Big Data: MapReduce (Symphony), SPARK
2. Performance tuning service
1. X-‐to-‐P migraNon: AutoPort tool
2. OpenPOWER new system test service
1. On-‐line video courses
2. Teacher course management
3. User contribuNon management
1. Project management service
2. DevOps automaNon
Storage IBM POWER servers OpenPOWER server FPGA/GPU
Docker
(Online) (Online) (Preparing) (Online)
Summary
• Spark + OpenStack + Docker works very well on OpenPOWER servers
• Dockerized services made DevOps easier • Docker issues
– Zombie process – Can’t dynamically aeach volume – Commit, aeach operaNons is not supported on Nova-‐docker
• Monitoring everything (API, resources etc) – key for operaNng a cloud
• TODO: Spark 1.4, Spark IDE, SWIFT, Data VisualizaNon
4/18/15 IBM Research -‐ China 15 �
4/18/15 IBM Research -‐ China 16 �
Join Us!
www.ptopenlab.com
QQ group: SuperVessel
SuperVessel WeChat group
@冠诚 � [email protected] �