32
Apr 2010 Introduction to Cloud Computing Marin Dimitrov (technology watch #3)

Introduction to Cloud Computing

Embed Size (px)

Citation preview

Page 1: Introduction to Cloud Computing

Apr 2010

Introduction to Cloud Computing

Marin Dimitrov

(technology watch #3)

Page 2: Introduction to Cloud Computing

Apr 2010

Contents

• Introduction

• Cloud Computing platforms

• Programming for the Cloud

• Semantic Web on the Cloud

Cloud Computing #2

Page 3: Introduction to Cloud Computing

Apr 2010

Contents

Part I

Introduction

Cloud Computing #3

Page 4: Introduction to Cloud Computing

Apr 2010

Cloud Computing - NIST definition

• “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computingresources (e.g., networks, servers, storage, applications, and services)that can be rapidly provisioned and released with minimal managementeffort or service provider interaction.”

• Delivery models

– IaaS (Infrastructure as a Service) - the consumer uses "fundamentalresources" such as processing power, storage, networking components ormiddleware. The consumer can control the operating system, storage,applications and possibly networking

– PaaS (Platform as a Service) - the consumer uses a hosting environment fortheir applications and has control over the applications (and some controlover the hosting environment), but does not control the infrastructure onwhich they are running

– SaaS (Software as a Service) - the consumer uses an application, but does notcontrol the infrastructure on which it's running (OS, hardware)

Cloud Computing #4

Page 5: Introduction to Cloud Computing

Apr 2010

XaaS spectrum – Google, Amazon, Microsoft

Cloud Computing #5

SaaS• Elastic Map Reduce • Gmail

• Google apps

PaaS• SimpleDB• Relational DataStore• Flexible Payment Service

• App Engine• BigTable / MegaStore

• SQL Azure

IaaS• EC2• Simple Queue Service• Simple Notification Service• Elastic Block Storage• S3 / RRS• CloudWatch / Auto Scaling• Elastic Load Balancer

• Google Storage • Blob storage• Azure Computing• Queues• Load Balancer

Page 6: Introduction to Cloud Computing

Apr 2010

Cloud Computing - Essential characteristics (NIST)

• Rapid elasticity – the ability to scale resources both up and down asneeded. To the consumer, the Cloud appears to be infinite, and theconsumer can purchase as much / little computing power as they need

• Measured service – aspects of the Cloud service are controlled andmonitored by the Cloud provider. This is crucial for billing, access control,resource optimization & capacity planning

• On-demand self service – a consumer can use cloud services as neededwithout any human interaction with the cloud provider

• Ubiquitous network access – the Cloud provider’s capabilities areavailable over the network and can be accessed through standardmechanisms

• Resource Pooling – allows a Cloud provider to serve its consumers via amulti-tenant model - resources are (re)assigned according to consumerdemand.

Cloud Computing #6

Page 7: Introduction to Cloud Computing

Apr 2010

Cloud Computing - deployment models (NIST)

• Public cloud– Infrastructure owned by some organisation but sold to 3rd parties

– E.g. Amazon Web Services, Google AppEngine, Windows Azure

• Private cloud– Internal infrastructure for a single organisation (on or off-premise)

– E.g. VMware vCloud, IBM Cloudburst, Microsoft Hyper-V

• Community cloud– Infrastructure shared by several organisations, targeting a specific

community

– E.g. OpenCirrus (HP, Intel, Yahoo, KIT, CMU, …)

• Hybrid cloud– Composition of the above

– E.g. AWS Virtual Private Cloud

Cloud Computing #7

Page 8: Introduction to Cloud Computing

Apr 2010

Cloud computing – business drivers

1. Business agility– Faster time to market

• No major upfront commitment & investment in infrastructure

– Scalability & elasticity

• Instant on-demand provisioning

• Shifting the risk of over-/under-provisioning to the cloud provider

2. Focus– Outsource non-core tasks to the cloud provider

3. Pay-as-you-go– Speed up new project launching & rollout (start small, add resources

when needed)

– No need for complex planning ahead

– Turn fixed costs (CapEx) into variable costs (OpEx)

Cloud Computing #8

Page 9: Introduction to Cloud Computing

Apr 2010

Some cloud use cases

• Overflow buffer– Avoid over-provisioning for peak loads, but just for the average load

• Seasonal business– E.g. Wallmart has 4:1 peak-to-average ratio (source?)

• Small startups time-to-market– Less upfront investment, more focus on core competencies

• Experimental playground– Rollout experimental projects without major equipment purchases

• Speedup of large scale batch operations– 1000 servers for 1 hour cost the same as 1 server for 1000 hours

– More cost-efficient computing (off-peak tariffs & time zones)

• Unforeseeable events– E.g. sudden traffic spikes to web sites (volcanoes, anyone?)Cloud Computing #9

Page 10: Introduction to Cloud Computing

Apr 2010

Cloud-able applications

• Typical characteristics– Non mission critical

– Need >99% uptime

– Low bandwidth / higher latency tolerance

– Relaxed security requirements

– Few integration points

– E.g

• Batch operations (speedup at the same price!)

• One-time large scale processing

• Barriers to cloud migration– Security & trust

– Lack of SLA

– Lack of standardization (vendor lock-in)

Cloud Computing #10

Page 11: Introduction to Cloud Computing

Apr 2010

Cloud Computing – pros & cons

Cloud Computing #11

(C) Dion Hinchcliffe

Page 12: Introduction to Cloud Computing

Apr 2010

Contents

Part II

Cloud Computing Platforms

AWS, Google AppEngine, Windows Azure

Cloud Computing #12

Page 13: Introduction to Cloud Computing

Apr 2010

XaaS spectrum – Google, Amazon, Microsoft (again)

Cloud Computing #13

SaaS• Elastic Map Reduce • Gmail

• Google apps

PaaS• SimpleDB• Relational Database Service• Flexible Payment Service

• App Engine• BigTable / MegaStore

• SQL Azure

IaaS• EC2• Simple Queue Service• Simple Notification Service• Elastic Block Storage• S3 / RRS• CloudWatch / Auto Scaling• Elastic Load Balancer• Virtual Private Cloud

• Google Storage • Blob storage• Azure Computing• Queues• Load Balancer

Page 14: Introduction to Cloud Computing

Apr 2010

Amazon Web Services

• http://aws.amazon.com/

• Xen VMs, 1 ECU = 1.2GHz AMD Opteron, US/EU prices

Cloud Computing #14

EC2 instance RAM GB

CU*(Cores)

HDDGB

bit $/h ondemand

$/hSpot

$/h reserved

S 1.7 1 (1) 160 32 0.085 0.03 0.03

L 7.5 4 (2) 850 64 0.34 0.13 0.12

XL 15 8 (4) 1690 64 0.68 0.24 0.24

High-mem XL 17.1 6.5 (2) 420 64 0.50 0.18 0.17

High-mem 2XL 34.2 13 (4) 850 64 1.20 0.43 0.42

High-mem 4XL 68.4 26 (8) 1690 64 2.40 0.82 0.84

High-CPU M 1.7 5 (2) 350 32 0.17 0.06 0.06

High-CPU XL 7 20 (8) 1690 64 0.68 0.24 0.24

Page 15: Introduction to Cloud Computing

Apr 2010

Amazon Web Services (2)

• Simple Storage Service (S3)– Eventually consistent blob storage (SLA available)

– Max 5GB per object, REST+SOAP API

– Storage $0.15/GB/mo, transfer $0.15/GB, $0.10 per 100K API calls

• Elastic Compute Cloud (EC2)– Xen VM, Amazon Machine Image (AMI), no SLA

• Elastic Block Storage (EBS)– Up to 1TB storage to be used by EC2 instances (attached devices)

– Raw/unformatted block devices (create your own filesystem on top)

– Replicated

– $0.10/GB/mo, $0.10 per 1 million I/O ops (iostat)

Cloud Computing #15

Page 16: Introduction to Cloud Computing

Apr 2010

Amazon Web Services (3)

• Simple Queue Service– Persistent, reliable, secure, distributed queue (no SLA)

– Message size 8KB, autodelete 4 days

– duplicate and out-of-order delivery may occur

– Price: $0.15/GB transfer, $0.10 per 100K API calls

• Simple Notification Service– Reliable, secure & scalable pub/sub service (no SLA)

– Protocols: HTTP, e-mail, SQS

– Price: $0.15/GB transfer, $0.06 per 100K API calls, price per 100K notifications: $0.06 (HTTP), $2.00 (e-mail), free (SQS)

• SimpleDB– Distributed column store (built on Erlang)

– Consistent or eventually consistent reads, flexible schema

– $0.14/hour consumed, $0.15/GB transfer, $0.25/GB/mo storage Cloud Computing #16

Page 17: Introduction to Cloud Computing

Apr 2010

Amazon Web Services (4)

• Relational Database Service– MySQL (no SLA)

– Automated backup and scaling

– $0.11 to $3.10 per hour (instance type), $0.10/GB/mo storage, $0.10 per million I/O ops, $0.15/GB transfer

• Elastic MapReduce– Based on Hadoop

– Price: EC2 instance price + premium ($0.01 - $0.42/hour)

• CloudWatch, Auto Scaling, Elastic Load Balancer– Monitoring, auto scaling & load balancing for EC2

• Virtual Private Cloud

Cloud Computing #17

Page 18: Introduction to Cloud Computing

Apr 2010

Google AppEngine

• http://code.google.com/appengine/

• Features– custom JVM (lots of limitations)

– servlet container, JSP

– Datastore based on BigTable (column store, consistent, C+P)

– JDO/JPA

– Google infrastructure services: URL fetch, mail

– Memcache (in-memory distributed key/value cache)

– Task queues & scheduler

– Development: local dev server, Eclipse plugins, administration

• Pricing– traffic/GB $0.10 ($0.12); CPU/h $0.10; storage/GB/mo $0.15; e-mail

$1 per 10K

Cloud Computing #18

Page 19: Introduction to Cloud Computing

Apr 2010

Google AppEngine (2)

Cloud Computing #19

(C) Dan Sanderson / O’Reilly

Page 20: Introduction to Cloud Computing

Apr 2010

Google AppEngine (3)

• Restrictions– Applications run in a restricted JVM sandbox

• No threads, no System calls, limited reflection

– No sub-process forking

– Connections

• Outbound – only URL fetch & mail

• Inbound – only HTTP(S)

– No filesystem writes (limited read access), use datastore instead

– Limits

• Request duration – 30 sec

• Request/response size – 10 MB (datastore request/response – 1MB)

• file size – 10 MB, number of files – 3,000

• Datastore: entity size – 1 MB, property values – 1000, entities per batch -500

Cloud Computing #20

Page 21: Introduction to Cloud Computing

Apr 2010

Google AppEngine (4)

• Datastore– Based on BigTable, distributed column-store

• Entities and multi-valued properties

• Entities have unique key & a type (kind)

• Flexible schema

– Transactional, consistent

– JDO/JPA interface

• Queries– JDOQL: entity kind + property value restrictions + sort order

– Cursors can be specified (query range)

– query resultset is materialised in a predefined index

• query execution only fetches data from the existing index

• queries with same kind + property restriction operator (but different value filler) + same sort order share the same index

Cloud Computing #21

Select from Personwhere lastName = …

&& height < …order by height desc

Page 22: Introduction to Cloud Computing

Apr 2010

Windows Azure

• http://www.microsoft.com/windowsazure/

• Components– Windows Azure

• Fabric – management & monitoring of cloud services (Hyper-V)

• Compute – hosted applications (.net, c++, java, …)

• Storage – blob storage, tables, queues (REST interface)

– SQL Azure

• Cloud based MS SQL Server

– AppFabric

• Infrastructure services, Service registry

• Access control

• Pricing– CPU/h $0.12; storage $0.15/GB/mo, transfer $0.10 ($0.15), storage

transactions – $1 per 1 million

Cloud Computing #22

Page 23: Introduction to Cloud Computing

Apr 2010

Windows Azure (2)

Cloud Computing #23

(C) David Chapell

Page 24: Introduction to Cloud Computing

Apr 2010

Contents

Part III

Programming for the Cloud

Tools & APIs

Cloud Computing #24

Page 25: Introduction to Cloud Computing

Apr 2010

Programming for the Cloud

• Amazon– REST API

– AWS Java SDK (http://aws.amazon.com/sdkforjava/)

– AWS Toolkit for Eclipse (http://aws.amazon.com/eclipse)

– Typica (http://code.google.com/p/typica/)

– JetS3t (S3 only) http://jets3t.s3.amazonaws.com/index.html

• Google AppEngine– AppEngine SDK (dev server, admin tools, Eclipse plugins)

– Datastore: JDO, JPA, low-level Java API

– Memcache: JCache + low level Java API

– URL fetch: java.net + low level Java API

– Mail: java.mail + low level Java API

– Task queue, blob store, accounts: low level APIs

Cloud Computing #25

Page 26: Introduction to Cloud Computing

Apr 2010

Programming for the Cloud (2)

• jClouds– http://code.google.com/p/jclouds/

– Cloud interoperability framework (AWS, Google AppEngine*, Windows Azure, GoGrid)

– Mostly storage oriented functionality

• Eucalyptus– http://www.eucalyptus.com/

– Open source private cloud infrastructure

– AWS compatible (EC2, EBS, S3)

– Cross-hypervisor support

Cloud Computing #26

(C) Eucalyptus Inc.

Page 27: Introduction to Cloud Computing

Apr 2010

Don’t forget…

• Deploying on EC2 requires minimal to no modifications of existing software

• EC2 has some big machines: 70GB RAM / 8 CPU cores

• 1,000 servers for 1hr cost the same as 1 server for 1,000hrs

• Data traffic (in/out) of the Cloud can be expensive

• Storage relatively cheap

• Internal cloud traffic is free (AWS), e.g. accessing other applications/datasets on the Cloud

• CPU price: uptime (EC2) vs. computing cycles (AppEngine)

• EC2 spot instances (off-peak hours) are very, very cheap!

Cloud Computing #27

Page 28: Introduction to Cloud Computing

Apr 2010

Contents

Part IV

Semantic Web on the Cloud

Cloud Computing #28

Page 29: Introduction to Cloud Computing

Apr 2010

Semantic Web on the Cloud

• Public Data Sets on AWS– A lot of datasets hosted for free by Amazon

• Freebase, UniGene, US Census, …

– New data sets can be submitted too (after approval)

– Full LOD cloud still not available (due to licensing issues)

• SaaS– Virtuoso (AWS hosted), OpenCalais, …

• “Semantic Cloud” initiatives (cloud interoperability & data integration)– E.g. fluidOps - Management & provisioning of semantic applications

(SaaS) and datasources (DaaS) on the Cloud

• Semantic Web apps as virtual appliances on the Cloud

• LOD data sources as virtual resources on the Cloud (“Self-service” paradigm)

Cloud Computing #29

Page 30: Introduction to Cloud Computing

Apr 2010

Unified Cloud Computing

• http://code.google.com/p/unifiedcloud/

• Uses RDF for cloud data interoperability

Cloud Computing #30

Page 31: Introduction to Cloud Computing

Apr 2010

Useful and useless links

• http://groups.google.com/group/cloud-computing

• “An Essential Guide to Possibilities and Risks of Cloud Computing”

• “Talking To Your CFO About Cloud Computing”

• Nick Carr @ Atmosphere’2009

• Introducing the Windows Azure platform

Cloud Computing #31

Page 32: Introduction to Cloud Computing

Apr 2010

Q & A

Questions?

Cloud Computing #32