THE MANY COLORS OF CHAMELEON · 2019-02-06 · Powered by OpenStack with bare metal reconfiguration...

Preview:

Citation preview

www. chameleoncloud.org

THE MANY COLORS OF CHAMELEON Kate Keahey

Mathematics and CS Division, Argonne National Laboratory

CASE, University of Chicago

keahey@anl.gov February 6, 2019 Chameleon User Meeting

www. chameleoncloud.org

CHAMELEONINANUTSHELL� Weliketochange:testbedthatadaptsitselftoyourexperimentalneeds

�  Deepreconfigurability(baremetal)andisolation(CHI)–butalsoeaseofuse(KVM)�  CHI:poweron/off,reboot,customkernel,serialconsoleaccess,etc.

� Wewanttobeallthingstoallpeople:balancinglarge-scaleanddiverse�  Large-scale:~largehomogenouspartition(~15,000cores),5PBofstoragedistributedover

2sites(now+1!)connectedwith100Gnetwork…�  …anddiverse:ARMs,Atoms,FPGAs,GPUs,Corsaswitches,etc.

� Wewanttolast:cost-effectivetodeploy,operate,andenhance�  PoweredbyOpenStackwithbaremetalreconfiguration(Ironic)�  ChameleonteamcontributionrecognizedasofficialOpenStackcomponent

� Welivetoserve:open,productiontestbedforComputerScienceResearch�  Startedin10/2014,testbedavailablesince07/2015,renewedin10/2017�  Currently~3,000users,~500projects,~100institutions

www. chameleoncloud.org

CHAMELEONHARDWARE

ChameleonCoreNetwork100Gbpsuplinkpublicnetwork

(eachsite)

CoreServices3.5PBStorageSystem

CoreServices0.5PBStorageSystem

HeterogeneousCloudUnitsGPUs(K80,M40,P100),FPGAs,NVMe,SSDs,IB,

ARM,Atom,low-powerXeon

HaswellStandardCloudUnit

42compute4storage

x2

HaswellStandardCloudUnit

42compute4storage

x10

SkyLakeStandardCloudUnit

32computeCorsaSwitch

x2

SkyLakeStandardCloudUnit

32computeCorsaSwitch

x1

GENIandotherpartners

ChameleonAssociateSiteNorthwestern

ChicagoAustin

www. chameleoncloud.org

EXPERIMENTALWORKFLOW

discover resources

allocate resources

configure and interact monitor

- Fine-grained - Complete - Up-to-date - Versioned - Verifiable

- Advance reservations - On-demand - Isolation - Across resource types

- Deeply reconfigurable - Appliance catalog - Snapshotting - Complex Appliances - Network Isolation

- Hardware metrics - Fine-grained data - Aggregate - Archive

CHI = 65%*OpenStack + 10%*G5K + 25%*”special sauce”

www. chameleoncloud.org

IMPROVINGTHEPLATFORM:NETWORKING� Multi-tenantnetworkingallowsuserstoprovisionisolatedL2VLANs

andmanagetheirownIPaddressspace(sinceFall2017)�  StitchingdynamicVLANsfromChameleontoexternalpartners

(ExoGENI,ScienceDMZs)(sinceFall2017)�  VLANs+AL2SconnectionbetweenUCandTACCfor100Gexperiments

(sinceSpring2018)�  BYOC–BringYourOwnController:isolatedusercontrolledvirtual

OpenFlowswitches(sinceSummer2018)� Managingmultiplestitches(sinceFall2018)�  VLANreservations(sinceWinter2019),floatingIPreservationscoming

soon!

www. chameleoncloud.org

BRING-YOUR-OWN-CONTROLLER(BYOC)�  SoftwareDefinedNetworking

(SDN)�  CorsaVirtualForwarding

Context(VFC)�  OpenFlow1.3�  Userdefinedcontroller

�  WithinChameleonoranywhereontheInternet

�  AvailableonSkylakenodes

�  Supportedcapabilities�  SDNexperiments�  Experimentsrequiringnon-

standardnetworkingcapabilities

StandardCloudUnit

CorsaSwitch

OpenFlowController(TenantA)

Ryu

ComputeNode

(TenantA)

ComputeNode

(TenantA)

ComputeNode

(TenantB)

ComputeNode

(TenantB)

VFC(TenantA)

OpenFlowController(TenantB)

VFC(Tenantb)

OpenFlowController(TenantA)

www. chameleoncloud.org

EXTERNALSTITCHING

� Layer2VLANsfromChameleontoexternalpartners� ExoGENI,ScienceDMZs,Esnet,andAL2S

� VFCswithmultipleL2stitchedlinks� NamedVFCs

StandardCloudUnit

Internet 2 AL2S, GENI, Future Partners

ChameleonCoreNetwork100Gbpsuplinkpublicnetwork

Chicago

Austin

ComputeNode

(TenantA)

ComputeNode

(TenantA)

ComputeNode

(TenantB)

ComputeNode

(TenantB)

VFC(TenantA)

OpenFlowController(TenantB)

OpenFlowController(TenantA)

Ryu

VFC(Tenantb)

www. chameleoncloud.org

NETWORKINGPATTERNSMADEEASY

�  Sharednet1�  Pre-configuredlocalsharednetwork

�  Sharedwan1�  Stitchedsharednetwork

�  Pre-configured

�  ConnectsUCandTACC

�  Upto100Gbps

�  Askhowtoaddittoyourproject!

ChameleonCoreNetwork100Gbpsuplinkpublicnetwork

Chicago

StandardCloudUnit

ComputeNode

ComputeNode

Austin

sharednet1 ComputeNode

ComputeNode

StandardCloudUnit

ComputeNode

ComputeNode

sharednet1ComputeNode

ComputeNode

sharedwan1sharedwan1

www. chameleoncloud.org

IMPROVINGTHEPLATFORM:OTHERFEATURES

�  Leasemanagement:adding/removingnodesto/fromalease,notificationsofleasestartandimpendingtermination

�  Advancereservationorchestration�  Powerandtemperaturemetrics� WholediskimagebootforARMnodes�  Newappliances(Hadoop,ExoGENI,BYOCexamples)andarichersetof

appliancefeatures:FUSEmoduleandnetworkingsupport�  Usabilityfeatures:multi-regionconfiguration,singlelogintoallweb

interfaces,betteraccesstoinformation,bettererrorhandling,softwareself-updates,betterappliancepublishing,documentationoverhaul,etc.

�  Chameleontracesarenowavailableatwww.scienceclouds.org

www. chameleoncloud.org

BEYONDTHEPLATFORM:BUILDINGANECOSYSTEM�  Helpinghardwareprovidersinteract

�  BringYourOwnHardware(BYOH)

�  CHI-in-a-Box:deployyourownChameleonsite

�  Helpingouruserinteract–withusbutprimarilywitheachother�  Facilitatingcontributionsofappliances,tools,andotherartifacts:appliancecatalog,

blogasapublishingplatform,andeventuallynotebooks

�  Integratingtoolsforexperimentmanagement

�  Makingreproducibilityeasier

�  Improvingcommunication–notjustwithusbutwithourusersaswell

www. chameleoncloud.org

CHI-IN-A-BOX�  CHI-in-a-box:packagingacommodity-basedtestbed

�  Firstreleasedinsummer2018,continuouslyimproving

�  CHI-in-a-boxscenarios�  Independenttestbed:packageassumesindependentaccount/projectmanagement,

portal,andsupport�  Chameleonextension:jointheChameleontestbed(currentlyservingonlyselected

users),andincludesbothuserandoperationssupportPart-timeextension:defineandimplementcontributionmodels

�  Part-timeChameleonextension:likeChameleonextensionbutwiththeoptiontotakethetestbedofflineforcertaintimeperiods(supportislimited)

�  Adoption�  NewChameleonAssociateSiteatNorthwesternsincefall2018–newnetworking!�  Twoorganizationsworkingonindependenttestbedconfiguration

www. chameleoncloud.org

REPRODUCIBILITYDILEMMA

�  Reproducibilityasside-effect:loweringthecostofrepeatableresearch�  Example:Linux“history”command�  Fromameanderingscientificprocesstoarecipe

�  Reproducibilitybydefault:documentingtheprocessviainteractivepapers

? Should I invest in more new research instead?

Should I invest in making my experiments repeatable?

www. chameleoncloud.org

REPEATABILITYMECHANISMSINCHAMELEON�  Testbedversioning(collaborationwithGrid’5000)

�  BasedonrepresentationsandtoolsdevelopedbyG5K

�  >50versionssincepublicavailability–andcounting

�  Stillworkingon:betterfirmwareversionmanagement

�  Appliancemanagement�  Configuration,versioning,publication

�  Appliancemeta-dataviatheappliancecatalog

�  OrchestrationviaOpenStackHeat

� Monitoringandlogging�  However…theuserstillhastokeeptrackofthisinformation

www. chameleoncloud.org

KEEPINGTRACKOFEXPERIMENTS�  Everythinginatestbedisarecordedevent�  Theresourcesyouused�  Theappliance/imageyoudeployed�  Themonitoringinformationyourexperimentgenerated�  Plusanyinformationyouchoosetosharewithus:e.g.,“start

power_exp_23”and“stoppower_exp_23

�  Experimentprécis:informationaboutyourexperimentmadeavailableina“consumable”form

www. chameleoncloud.org

REPEATABILITY:EXPERIMENTPRÉCIS

Experiment précis

OpenStack services

Instance monitoring

Infrastructure monitoring

User events

Store and share

Orchestrator (Heat)

www. chameleoncloud.org

EXPERIMENTPRÉCIS:ACASESTUDY

Based on Wang et al., Understanding and Auto-Adjusting Performance-Sensitive Configurations. ASPLOS, 2018

Based on Wang et al., Understanding and Auto-Adjusting Performance-Sensitive Configurations. ASPLOS, 2018

www. chameleoncloud.org

INTERACTIVEPAPERS� Whatdoesitmeantodocumentaprocess?�  Somerequirements

�  Easytoworkwith:humanreadable/modifiableformat�  IntegrateswellwithALLaspectsofexperimentmanagement�  Bitbybitreplay–allowsforbitbybitmodification(andintrospection)aswell–elementof

interactivity�  Supportstorytelling:allowsyoutoexplainyourexperimentdesignandmethodology

choices�  Hasadirectrelationshiptotheactualpaperthatgetswritten�  Canbeversioncontrolled�  Sustainable,apopularopensourcechoice

�  Implementationoptions�  Orchestrators:Heat,thedashboard,andOpenStackFlame�  Notebooks:Jupyter,NextJournal

www. chameleoncloud.org

CHAMELEONJUPYTERINTEGRATION�  Combiningtheeaseofnotebooksandthepowerofasharedplatform

�  StorytellingwithJupyter:ideas/text,process/code,results�  Chameleonsharedexperimentalplatform

�  JupyterLabserverforourusers

�  Justgotojupyter.chameleoncloud.organdloginwithyourChameleoncredentials

�  Chameleon/Jupyterintegration�  Alternativeinterface

�  Allthemaintestbedfunctions

�  “HelloWorld”templateScreencastofacomplexexperiment:https://vimeo.com/297210055

www. chameleoncloud.org

SHARING,EXPERIMENTING,LEVERAGING�  SharingJupyternotebooksinChameleon

�  Today:fromhomedirectorytosharingviaourSwiftstoragewithyourprojectmembers

�  Challengesahead:moreflexiblesharingpolicyimplementation,integratingwithgithubforbetterversioningandsharingsupport

�  AutomatingexperimentswithJupyter

www. chameleoncloud.org

PARTINGTHOUGHTS�  Physicalenvironment:Chameleonisarapidlyevolvingexperimental

platform�  Originally:“Adaptstotheneedsofyourexperiment”�  Nowalso:“Adaptstotheneedsofitscommunityandthechangingresearchfrontier”

�  TowardsanEcosystem:ameetingplaceofusersandproviderssharingresourcesandresearch�  Testbedsaremorethanjustexperimentalplatforms�  Common/sharedplatformisa“commondenominator”thatcaneliminatemuch

complexitythatgoesintosystematicexperimentation,sharing,andreproducibility

�  Bepartofthechange:telluswhatcapabilitiesweshouldprovidetohelpyoushareandleveragethecontributionsofothers!

Recommended