Hitchhiker's Guide to Open Source Cloud Computing

  • Published on
    11-May-2015

  • View
    5.518

  • Download
    1

Embed Size (px)

DESCRIPTION

Imagine its eight oclock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galctica to discern the best course of action but your copy is likely out of date. And while the Hitchhikers Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesnt cover the nuances of cloud computing. Thats why you need the Hitchhikers Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more. Specific topics for discussion will include: Infrastructure-as-a-Service - The Systems Cloud - Get a comparision of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus, OpenNebula Platform-as-a-Service - The Developers Cloud - Find out what tools are availble to build portable auto-scaling applications including CloudFoundry, OpenShift, Stackato and more. Data-as-a-Service - The Analytics Cloud - Want to figure out the who, what , where , when and why of big data ? You get an overview of open source NoSQL databases and technologies like MapReduce to help crunch massive data sets in the cloud. Finally you'll get a overview of the tools that can help you really take advantage of the cloud? Want to auto-scale virtual machiens to serve millions of web pages or want to automate the configuration of cloud computing environments. You'll learn how to combine these tools to provide continous deployment systems that will help you earn DevOps cred in any data center. [Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]

Transcript

  • 1.Hitchhikers Guide to OpenSource Cloud ComputingBy Mark R. HinkleSenior Director, Cloud Computing CommunityCitrix Inc.Twitter: @mrhinkleBlog: www.socializedsoftware.comEmail: mrhinkle@socializedsoftware.com

2. %whoami 3. Why Open Source and the Cloud? User-Driven Context from Solving Real Problems Lower Barrier to Participation Larger user base, users helping users Aggressive release cycles stay current with the state-of-the-art Open Source innovating faster than commercial Open data, Open standards, Open APIs 4. Quick Cloud ComputingOverview or the ObligatoryWhat is the CloudExplanation 5. Five Characteristics of Cloud1. On-Demand Self-Service2. Broad Network Access3. Resource Pooling4. Rapid Elasticity5. Measured Service 6. Cloud Computing Service Models USER CLOUD a.k.a. SOFTWARE AS A SERVICE Single application, multi-tenancy, network-based, one-to-many delivery of applications, all users have same access to features. Examples: Salesforce.com, Google Docs, Red Hat Network/RHEL DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE Application developer model, Application deployed to an elastic service that autoscales, low administrative overhead. No concept of virtual machines or operating system. Code it and deploy it. Examples: VMware CloudFoundry, Google AppEngine, Windows Azure, Rackspace Sites, Red Hat OpenShift, Active State Stackato, Appfog SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE Servers and storage are made available in a scalable way over a network. Examples: EC2,Rackspace CloudFiles, OpenStack, CloudStack, Eucalyptus, OpenNebula 7. Deployment Models: Public, Private& Hybrid 8. Building Open Source Clouds 9. Cloud Architecture 10. HypervisorsOpen Source Xen, Xen Cloud Platform (XCP) KVM Kernel-based Virtualization VirtualBox* - Oracle supported Virtualization Solutions OpenVZ* - Container-based, Similar to Solaris Containers or BSD Zones LXC User Space chrooted installsProprietary VMware Citrix Xenserver (based Microsoft Hyper-V OracleVM (Based on OS Xen) 11. Open Virtual Machine FormatsOpen Virtualization Format (OVF) is an openstandard for packaging and distributing virtualappliances or more generally software to be run invirtual machines.Formats for hypervisors/cloud technologies: Amazon - AMI KVM QCOW2 VMware VMDK Xen IMG VHD Virtual Hard Disk - Hyper-V 12. Sourcing Cloud AppliancesTool/Project What you can do with themBitnamiBitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more.Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providersOz Command-line tool that has the ability to create images for common Linux distributions to run on KVMSUSE StudioSUSE Studio supports building and deploying directly to cloud services such as Amazon EC2. 13. Scale-Up or Scale-OutVertical Scaling (Scale-Up) Allocateadditional resources to VMs,requires a reboot, no need fordistributed app logic, single-point of OS failureHorizontal Scaling (Scale-Out) Application needs logic towork in distributed fashion(e.g. HA-Proxy and Apache,Hadoop) 14. Compute Clouds (IaaS) Year Started License VirtualizationTechnologies 2008 ApacheXenserver, Xen CloudApachePlatform, KVM, VMwareCloudStack(Hyper-V developing) 2006 GPL Xen, KVM, VMwareEucalyptus(commercial version) 2005 ApacheXen, KVM, VMwareOpenNebula 2010 (Developed by ApacheVMware ESX and ESXi, ,OpenStackNASA by Anso LabsXen, Xen Cloud Platform previously)KVM, LXC, QEMU andVirtual Box 15. OpenStack Ecosystem of ProjectsDashboard HorizonAPI Advanced Cloud and NetworkingQuantum Networking FabricIdentity Services Keystoneservices accessing the Quantum API Firewall ServiceObject Image ComputeREST API Gateway ServiceStorage Service NovaSwift Glance Plugins KVM, VMware, Xen OpenvSwitch Ceph, Gluster Cloud Platform QuantumPlugin-ins Enterprise Message Queue based on Rabbit MQ (ESB) 16. Cloud APIs jclouds libcloud deltacloud fog 17. Cloud Computing StorageProject DescriptionGlusterFS Scale Out NAS system aggregating storage over Ethernet orInfinibandCephDistributed file storage system developed by DreamHostOpenStack Long-term object storage systemStorageSheepdogDistributed storage for KVM hypervisors 18. Platform-as-a-Service (PaaS)ProjectYear Started SponsorsLanguages/FrameworksCloudFoundry 2011 VMwareSpring for Java, Ruby forRails and Sinatra,node.js, Grails, Scala onLift and more viapartners (e.g. Python,PHP)Cloudify 2012 GigaspacesOpenShift ** 2011 Red Hat Java, Ruby, PHP, Perl andPythonStackato*2012 ActiveState Java, Python, PHP, Ruby,Perl, Node.js, othersWSO2 Stratus 2010 WSO2Jboss, Java EE6 19. Software DefinedNetworking (SDN) 20. Overview of Software DefinedNetworking Business Applications Network Services Network Devices Network DevicesNetwork Devices Network Devices Network DevicesNetwork Devices 21. Why SDN?Cloud PromiseCloud RealityCentralized Configuration and Without true virtualization, network devices mustAutomationstill be manually configured.Instant Self-ServiceIn a physical network, it could take a long time forProvisioningnetwork engineer to provision new services.Elasticity and ScalabilityBy horizontally scaling up the physical network,elasticity is lost.Designed for FailureFailover can be automated and physical networklimitations can be alleviated. 22. Open FlowOpenFlow enables networks to evolve, by giving a remote controller the power tomodify the behavior of network devices, through a well-defined "forwardinginstruction set". The growing OpenFlow ecosystem now includes routers, switches,virtual switches, and access points from a range of vendors. 23. Software Defined NetworkingProjectDescriptionFloodlight The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller.Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-gigabit ports. Multiple gigabit platforms with 10-gigabit uplinks are also supported.OpenStackPluggable, scalable, API-driven network and IP managementNetworkingQuantumOpen vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag). 24. Big Data 25. 10001200 200 400 600 800 0Dec-04 2012Mar-05Jun-05Sep-05Dec-05Mar-06Jun-06Sep-06Dec-06Mar-07Jun-07Sep-07Dec-07Mar-08Jun-08Sep-08Dec-08Mar-09Jun-09Sep-09Dec-09Mar-10Jun-10Sep-10Dec-10Mar-11Jun-11Sep-11Dec-11Mar-12Jun-12 1 Billion Facebook Users - OctoberSep-12 26. 100 150 200 250 300 350 400 450500Jan-07Mar-07May-07 Jul-07Sep-07Nov-07 June 2012Jan-08Mar-08May-08 Jul-08Sep-08Nov-08Jan-09Mar-09May-09 Jul-09Sep-09Nov-09Jan-10Mar-10May-10 Jul-10Sep-10Nov-10Jan-11Mar-11May-11 Jul-11Sep-11Nov-11Jan-12 Twitter at 400M Tweets Per Day Mar-12May-12 27. Big Data and Storage Infrastructure Data is growing faster than storage capacity and computing power. Legacy systems hold organizations back; storage software must include multi-petabyte capacity, support potentially billions of objects, and provide application performance awareness and agile provisioning. -Gartner, Big Data Challenges for the IT Infrastructure Team 28. Big Data Landscape 29. Open Source NoSQL DatabasesName TypeDescription Wide Column API: many Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency:Apache Store/Familieseventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated byCassandraFacebook Document StoreAPI: Memcached API+protocol (binary and ASCII) , most languages, Protocol: MemcachedCouchDBREST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets Wide Column API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec,HBaseStore/FamiliesReplication: HDFS Replication, Written in: Java Wide Column PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, nativeHypertable Store/FamiliesThrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Googles Bigtable. Document StoreAPI: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce,MongoDBReplication: Master Slave & Auto-Sharding, Written in: C++,Concurrency Key Value/ TupleAPI: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronousRedisStore disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues. Key Value / Tuple API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: MultipleRiak Store Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks) 30. MapReduce Worker Node 1ProblemDataMaster WorkerNode Node 2SolutionData Worker Node 3 31. Apache HadoopOverview Handles large amounts of data Stores data in native format Delivers linear scalability at low cost Resilient in case of infrastructure failures Transparent application scalabilityFacts Apache top-level open source project One framework for storage and compute HDFS Scalable storage in Hadoop Distributed File System (HDFS) Compute via the MapReduce distributed processing platform Domain Specific Language (DSL) - Java 32. Hadoop ArchitectureHiveHBasePig MahoutZookeeper Data warehouse that Column-orientedPlatform for Machine learningprovides SQL interface. schema-less distributedmanipulating and libraries for Ad hoc projection ofDB modeled after analyzing large data sets.recommendations data structure to Googles BigTable Scripting language for , clustering, classification unstructuredRandom real time analysts. s and item sets.read/write. Hadoop CommonChuckwa HDFSMapReduce Distributes & replicates data Distributes & monitors tasksacross machines 33. Big Data Summary Quantity of Machine Created Data Increasing Drastically (examples: networkedsensor data from mobile phones and GPS devices) Data manipulation moving from batched to real-time Cloud services giving everyone Big Data tools Consumer company speed and scale requirements driving efficiencies in Big Datastorage and analytics New and broader number of data sources being meshed together Big Data Apps means using Big Data is faster and easier 34. Cloud Management Tools 35. Automation in the Cloud 36. 4 Types of Management ToolsProvisioningInstallation of operating systems and other softwareConfiguration ManagementSets the parameters for servers, can specify installation parametersOrchestration/AutomationAutomate tasks across systemsMonitoringRecords errors and health of IT infrastructure 37. Management Toolchains MonitoringPatchingandProvisioningConfiguration 38. ProvisioningProjectInstallation TargetsAxemblr Provisionr Can provision 10s to 1000s of machines on various clouds.CobblerDistributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMsJuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS.Salt Cloud Tool to provision salted VMs that can then be updated by a central server via ZeroMQCrowbar(Bare metal provisioning) 39. Configuration Management ToolsProjectYear Started Language License Client/ServerCfengine 1993 CApacheYesChef 2009 Ruby ApacheChef Solo No Chef Server - YesPuppet 2004 Ruby GPL Yes & standaloneSalt 2011 Python Apacheyes 40. Automation/Orchestration ToolsProject DescriptionCapistranoUtility and framework for executing commands in parallel on multipleremote machines, via SSH. It uses a simple DSL that allows you todefine tasks, which may be applied to machines in certain rolesRunDeck Rundeck is an open-source process automation and commandorchestration tool with a web console.FuncFunc provides a two-way authenticated system for genericallyexecuting tasks, integrations with puppet and cobbler.MCollective The Marionette Collective AKA MCollective is a framework to buildserver orchestration or parallel job execution systems.SaltExecute arbitrary shell commands or choose from dozens of pre-builtmodules of common (or complex) commands.Scalr Provide scaling across multiple cloud computing platforms, integrateswith Chef. 41. Conceptual Automated ToolchainGenerate Images BootStrapped ImageProvision ConfigurationSUSE Studio CloudStack Cobbler PuppetBoxGrinderOpenStackSUSE StuidoChefMonitoringNagios Start/Stop ServicesZenossRunDeck Cacti Capistrano MCollective 42. NetFlix Open Source - ToolBag forAWShttp://netflix.github.com 43. Goodbye and thanks for allthe fish! 44. Questions?Slides Can be Viewed and Downloaded at:http://www.slideshare.net/socializedsoftware/ 45. Contact Me 46. Appendix 47. Additional Resources Devops Toolchains Group Software Defined Networking: The New Norm for Networks (Whitepaper) DevOps Wikipedia Page NoSQL-Database.org Ultimate Guide to the Non-Relational Universe Open Cloud Initiative NIST Cloud Computing Platform Open Virtualization Format Specs Clouderati Twitter Account Planet DevOps Nicira Whitepaper Its Time to Virtualize the Network Why Open vSwitch FAQ 48. Monitoring ToolsLicenseType of Monitoring CollectionMethodsCacti / RRDTool GPLPerformanceSNMP, syslogGraphiteApache 2.0 PerformanceAgentNagiosGPLAvailability SNMP,TCP, ICMP,IPMI, syslogZabbixGPLAvailability/SNMP, TCP/ICMP, Performance andIPMI, Synthetic more TransactionsZenossGPLAvailability,SNMP, ICMP, SSH, Performance, Event syslog, WMI Management

Recommended

View more >