Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014

  • Published on

  • View

  • Download

Embed Size (px)


  • Multi-Cell Openstack: How to Evolve your Cloud to Scale Belmiro Moreira - CERN Matt Van Winkle - Rackspace Sam Morrison - NeCTAR, University of


  • Cells: How we use them at NeCTAR


  • NeCTAR Research Cloud

    Started in 2011 Funded by the Australian Government 8 institutions around the country Production early 2012 - Openstack Diablo All federated to appear as 1 cloud from the

    users point of view Put the compute near the data and tools 5000+ users

  • NeCTAR Sites University of Melbourne National Computation Infrastructure Monash University Queensland CyberInfrastructure Foundation eResearch SA University of Tasmania Intersect, NSW iVEC, WA

  • Cells to build a Federation

    Use cells to federate geographically separated sites

    Different hardware/networks/people Parent cell run centrally at unimelb along

    with keystone/cinder/glance etc (no neutron) Each site has 1 or more compute cells These roughly match up to availability zones from a

    users perspective (cells are behind the scenes)

  • How big?

    Each site ~4000 cores, ~150 hypervisors 6 sites in production, 4600+ instances Last 2 sites in prod by end of year ~1000 hypervisors, 40k cores ~10 compute cells Some sites have multiple datacenters so

    have multiple cells

  • Pain points

    Cell scheduling isnt smart Broadcast calls rely on all cells to be alive Not many people to share experiences with Upgrades, although havana icehouse

    could happen in stages. Much easier!

  • Things weve added, not in trunk (yet) Security group syncing ec2 id mappings (needed for metadata) Availability zone / aggregate support Flavour management

    *We assume a cell only has 1 parent

  • Cells: How we use them at CERN

    Belmiro Moreiraemail: belmiro.moreira @

  • CERN

    Conseil Europen pour la Recherche Nuclaire aka European Organization for Nuclear Research

    Founded in 1954 with an international treaty 21 state members, other countries contribute to experiments Situated between Geneva and the Jura Mountains, straddling the Swiss-

    French border

    CERN mission is to do fundamental research CERN provides particle accelerators and other infrastructure

    for high-energy physics research

  • CERN - Cloud Infrastructure In production since July 2013 Performed two upgrades: Grizzly -> Havana -> Icehouse

    Currently running: nova; glance; keystone; horizon; cinder w/ Ceph; ceilometer

    RDO distribution on SLC6; pip with Windows Server 2012 R2 2 geographically separated data centres

    Geneva (Switzerland) and Budapest (Hungary) Numbers

    ~3000 compute nodes (75k cores; 140TB RAM) ~2900 kvm; ~100 Hyper-V;

    ~8000 virtual machines

  • CERN - Cloud Infrastructure - Cells Why we use cells?

    Scale transparently between different Data Centres Availability and Resilience Isolate different use-cases

    Today: 1 api Cell and 8 compute Cells 2 level tree size range between 100 to ~1600 compute nodes 6 Compute Cells in Switzerland; 2 Compute Cells in Hungary

    Shared and Private Cells 3 availability zones available in Shared Cells

  • CERN - Cells Limitations Missing functionality

    Security Groups Flavor propagation (api -> compute) Manage aggregates on api Cell Server groups

    Cell scheduler Ceilometer integration

  • CERN - Cells Challenges More ~74000 cores by beginning 2015

    How to organize and distribute nodes between different cells? Split current large cells into a small number (~200) of

    compute nodes Expected to have +30 cells by end 2015 How to manage a large number of Cells?

  • Created by: Matt Van Winkle @mvanwinkModified Date: 10/29/2014

    Cells at Rackspace

    Cells: How to Evolve Your Cloud to Scale

  • Managed Cloud company offering a suite of dedicated and cloud hosting products

    Founded in 1998 in San Antonio, TX

    Home of Fanatical Support

    More than 200,000 customers in 120 countries


  • In production since August 2012 Currently running: Nova; Glance; Neutron; Ironic; Swift; Cinder

    Regular upgrades from trunk Package built on trunk pull from 10/21 in testing now

    Compute nodes are Debian based Run as VMs on hypervisors and manage via XAPI

    6 Geographic regions around the globe DFW; ORD; IAD; LON; SYD; HKG

    Numbers 10s of 1000s of hypervisors (over 330K Cores, 1+ Petabyte of RAM)

    All XenServer

    Over 150,000 virtual machines

    Rackspace Cloud Infrastructure

  • Why we use cells? Manage Multiple Flavor Classes Network resources (Public IPs, Private IPs, aggregation routers, etc) Network Constraints Continual Supply Chain

    1 Global API cell per region with multiple Compute cells (3 35+) 2 level tree Size between ~100 and ~600 hosts per cell

    Control infrastructure exists as instances in small OpenStack deployment All cells available to all tenants

    Tested dedicated cells for potential large customers

    Rackspace Cloud Infrastructure - Cells

  • Missing Functionality Security Groups Host aggregates

    Scheduler No disable Incomplete host statuses

    Other services are not cell aware Neutron is a prime example

    Rackspace Cells Limitations

  • Increasing number of flavor classes Different Hardware specs per class Sizing varies by average VM density

    Multiple vendor sources Subtle hardware differences in same specs across different vendors

    Scaling global services with cell growth Still dont have the perfect ratios

    Rackspace Cells Challenges

  • Nova Dev team met this morning to discuss cells in a few sessions: Cells Wednesday, November 5, 09:00 Cells continued Wednesday, November 5, 09:50

    Areas of discussion Feature completion No-op/single cell as default Cell awareness in APIs

    Recap from sessions

    Cells Feature Completion

  • Thank You!

    Belmiro Moreira - CERN - Matt Van Winkle - Rackspace - @mvanwink Sam Morrison - NeCTAR, University of Melbourne - sam.morrison@unimelb.



View more >