Upload
marin-dimitrov
View
9.132
Download
0
Tags:
Embed Size (px)
Citation preview
Apr 2010
Introduction to Cloud Computing
Marin Dimitrov
(technology watch #3)
Apr 2010
Contents
• Introduction
• Cloud Computing platforms
• Programming for the Cloud
• Semantic Web on the Cloud
Cloud Computing #2
Apr 2010
Cloud Computing - NIST definition
• “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computingresources (e.g., networks, servers, storage, applications, and services)that can be rapidly provisioned and released with minimal managementeffort or service provider interaction.”
• Delivery models
– IaaS (Infrastructure as a Service) - the consumer uses "fundamentalresources" such as processing power, storage, networking components ormiddleware. The consumer can control the operating system, storage,applications and possibly networking
– PaaS (Platform as a Service) - the consumer uses a hosting environment fortheir applications and has control over the applications (and some controlover the hosting environment), but does not control the infrastructure onwhich they are running
– SaaS (Software as a Service) - the consumer uses an application, but does notcontrol the infrastructure on which it's running (OS, hardware)
Cloud Computing #4
Apr 2010
XaaS spectrum – Google, Amazon, Microsoft
Cloud Computing #5
SaaS• Elastic Map Reduce • Gmail
• Google apps
PaaS• SimpleDB• Relational DataStore• Flexible Payment Service
• App Engine• BigTable / MegaStore
• SQL Azure
IaaS• EC2• Simple Queue Service• Simple Notification Service• Elastic Block Storage• S3 / RRS• CloudWatch / Auto Scaling• Elastic Load Balancer
• Google Storage • Blob storage• Azure Computing• Queues• Load Balancer
Apr 2010
Cloud Computing - Essential characteristics (NIST)
• Rapid elasticity – the ability to scale resources both up and down asneeded. To the consumer, the Cloud appears to be infinite, and theconsumer can purchase as much / little computing power as they need
• Measured service – aspects of the Cloud service are controlled andmonitored by the Cloud provider. This is crucial for billing, access control,resource optimization & capacity planning
• On-demand self service – a consumer can use cloud services as neededwithout any human interaction with the cloud provider
• Ubiquitous network access – the Cloud provider’s capabilities areavailable over the network and can be accessed through standardmechanisms
• Resource Pooling – allows a Cloud provider to serve its consumers via amulti-tenant model - resources are (re)assigned according to consumerdemand.
Cloud Computing #6
Apr 2010
Cloud Computing - deployment models (NIST)
• Public cloud– Infrastructure owned by some organisation but sold to 3rd parties
– E.g. Amazon Web Services, Google AppEngine, Windows Azure
• Private cloud– Internal infrastructure for a single organisation (on or off-premise)
– E.g. VMware vCloud, IBM Cloudburst, Microsoft Hyper-V
• Community cloud– Infrastructure shared by several organisations, targeting a specific
community
– E.g. OpenCirrus (HP, Intel, Yahoo, KIT, CMU, …)
• Hybrid cloud– Composition of the above
– E.g. AWS Virtual Private Cloud
Cloud Computing #7
Apr 2010
Cloud computing – business drivers
1. Business agility– Faster time to market
• No major upfront commitment & investment in infrastructure
– Scalability & elasticity
• Instant on-demand provisioning
• Shifting the risk of over-/under-provisioning to the cloud provider
2. Focus– Outsource non-core tasks to the cloud provider
3. Pay-as-you-go– Speed up new project launching & rollout (start small, add resources
when needed)
– No need for complex planning ahead
– Turn fixed costs (CapEx) into variable costs (OpEx)
Cloud Computing #8
Apr 2010
Some cloud use cases
• Overflow buffer– Avoid over-provisioning for peak loads, but just for the average load
• Seasonal business– E.g. Wallmart has 4:1 peak-to-average ratio (source?)
• Small startups time-to-market– Less upfront investment, more focus on core competencies
• Experimental playground– Rollout experimental projects without major equipment purchases
• Speedup of large scale batch operations– 1000 servers for 1 hour cost the same as 1 server for 1000 hours
– More cost-efficient computing (off-peak tariffs & time zones)
• Unforeseeable events– E.g. sudden traffic spikes to web sites (volcanoes, anyone?)Cloud Computing #9
Apr 2010
Cloud-able applications
• Typical characteristics– Non mission critical
– Need >99% uptime
– Low bandwidth / higher latency tolerance
– Relaxed security requirements
– Few integration points
– E.g
• Batch operations (speedup at the same price!)
• One-time large scale processing
• Barriers to cloud migration– Security & trust
– Lack of SLA
– Lack of standardization (vendor lock-in)
Cloud Computing #10
Apr 2010
Cloud Computing – pros & cons
Cloud Computing #11
(C) Dion Hinchcliffe
Apr 2010
Contents
Part II
Cloud Computing Platforms
AWS, Google AppEngine, Windows Azure
Cloud Computing #12
Apr 2010
XaaS spectrum – Google, Amazon, Microsoft (again)
Cloud Computing #13
SaaS• Elastic Map Reduce • Gmail
• Google apps
PaaS• SimpleDB• Relational Database Service• Flexible Payment Service
• App Engine• BigTable / MegaStore
• SQL Azure
IaaS• EC2• Simple Queue Service• Simple Notification Service• Elastic Block Storage• S3 / RRS• CloudWatch / Auto Scaling• Elastic Load Balancer• Virtual Private Cloud
• Google Storage • Blob storage• Azure Computing• Queues• Load Balancer
Apr 2010
Amazon Web Services
• http://aws.amazon.com/
• Xen VMs, 1 ECU = 1.2GHz AMD Opteron, US/EU prices
Cloud Computing #14
EC2 instance RAM GB
CU*(Cores)
HDDGB
bit $/h ondemand
$/hSpot
$/h reserved
S 1.7 1 (1) 160 32 0.085 0.03 0.03
L 7.5 4 (2) 850 64 0.34 0.13 0.12
XL 15 8 (4) 1690 64 0.68 0.24 0.24
High-mem XL 17.1 6.5 (2) 420 64 0.50 0.18 0.17
High-mem 2XL 34.2 13 (4) 850 64 1.20 0.43 0.42
High-mem 4XL 68.4 26 (8) 1690 64 2.40 0.82 0.84
High-CPU M 1.7 5 (2) 350 32 0.17 0.06 0.06
High-CPU XL 7 20 (8) 1690 64 0.68 0.24 0.24
Apr 2010
Amazon Web Services (2)
• Simple Storage Service (S3)– Eventually consistent blob storage (SLA available)
– Max 5GB per object, REST+SOAP API
– Storage $0.15/GB/mo, transfer $0.15/GB, $0.10 per 100K API calls
• Elastic Compute Cloud (EC2)– Xen VM, Amazon Machine Image (AMI), no SLA
• Elastic Block Storage (EBS)– Up to 1TB storage to be used by EC2 instances (attached devices)
– Raw/unformatted block devices (create your own filesystem on top)
– Replicated
– $0.10/GB/mo, $0.10 per 1 million I/O ops (iostat)
Cloud Computing #15
Apr 2010
Amazon Web Services (3)
• Simple Queue Service– Persistent, reliable, secure, distributed queue (no SLA)
– Message size 8KB, autodelete 4 days
– duplicate and out-of-order delivery may occur
– Price: $0.15/GB transfer, $0.10 per 100K API calls
• Simple Notification Service– Reliable, secure & scalable pub/sub service (no SLA)
– Protocols: HTTP, e-mail, SQS
– Price: $0.15/GB transfer, $0.06 per 100K API calls, price per 100K notifications: $0.06 (HTTP), $2.00 (e-mail), free (SQS)
• SimpleDB– Distributed column store (built on Erlang)
– Consistent or eventually consistent reads, flexible schema
– $0.14/hour consumed, $0.15/GB transfer, $0.25/GB/mo storage Cloud Computing #16
Apr 2010
Amazon Web Services (4)
• Relational Database Service– MySQL (no SLA)
– Automated backup and scaling
– $0.11 to $3.10 per hour (instance type), $0.10/GB/mo storage, $0.10 per million I/O ops, $0.15/GB transfer
• Elastic MapReduce– Based on Hadoop
– Price: EC2 instance price + premium ($0.01 - $0.42/hour)
• CloudWatch, Auto Scaling, Elastic Load Balancer– Monitoring, auto scaling & load balancing for EC2
• Virtual Private Cloud
Cloud Computing #17
Apr 2010
Google AppEngine
• http://code.google.com/appengine/
• Features– custom JVM (lots of limitations)
– servlet container, JSP
– Datastore based on BigTable (column store, consistent, C+P)
– JDO/JPA
– Google infrastructure services: URL fetch, mail
– Memcache (in-memory distributed key/value cache)
– Task queues & scheduler
– Development: local dev server, Eclipse plugins, administration
• Pricing– traffic/GB $0.10 ($0.12); CPU/h $0.10; storage/GB/mo $0.15; e-mail
$1 per 10K
Cloud Computing #18
Apr 2010
Google AppEngine (2)
Cloud Computing #19
(C) Dan Sanderson / O’Reilly
Apr 2010
Google AppEngine (3)
• Restrictions– Applications run in a restricted JVM sandbox
• No threads, no System calls, limited reflection
– No sub-process forking
– Connections
• Outbound – only URL fetch & mail
• Inbound – only HTTP(S)
– No filesystem writes (limited read access), use datastore instead
– Limits
• Request duration – 30 sec
• Request/response size – 10 MB (datastore request/response – 1MB)
• file size – 10 MB, number of files – 3,000
• Datastore: entity size – 1 MB, property values – 1000, entities per batch -500
Cloud Computing #20
Apr 2010
Google AppEngine (4)
• Datastore– Based on BigTable, distributed column-store
• Entities and multi-valued properties
• Entities have unique key & a type (kind)
• Flexible schema
– Transactional, consistent
– JDO/JPA interface
• Queries– JDOQL: entity kind + property value restrictions + sort order
– Cursors can be specified (query range)
– query resultset is materialised in a predefined index
• query execution only fetches data from the existing index
• queries with same kind + property restriction operator (but different value filler) + same sort order share the same index
Cloud Computing #21
Select from Personwhere lastName = …
&& height < …order by height desc
Apr 2010
Windows Azure
• http://www.microsoft.com/windowsazure/
• Components– Windows Azure
• Fabric – management & monitoring of cloud services (Hyper-V)
• Compute – hosted applications (.net, c++, java, …)
• Storage – blob storage, tables, queues (REST interface)
– SQL Azure
• Cloud based MS SQL Server
– AppFabric
• Infrastructure services, Service registry
• Access control
• Pricing– CPU/h $0.12; storage $0.15/GB/mo, transfer $0.10 ($0.15), storage
transactions – $1 per 1 million
Cloud Computing #22
Apr 2010
Contents
Part III
Programming for the Cloud
Tools & APIs
Cloud Computing #24
Apr 2010
Programming for the Cloud
• Amazon– REST API
– AWS Java SDK (http://aws.amazon.com/sdkforjava/)
– AWS Toolkit for Eclipse (http://aws.amazon.com/eclipse)
– Typica (http://code.google.com/p/typica/)
– JetS3t (S3 only) http://jets3t.s3.amazonaws.com/index.html
• Google AppEngine– AppEngine SDK (dev server, admin tools, Eclipse plugins)
– Datastore: JDO, JPA, low-level Java API
– Memcache: JCache + low level Java API
– URL fetch: java.net + low level Java API
– Mail: java.mail + low level Java API
– Task queue, blob store, accounts: low level APIs
Cloud Computing #25
Apr 2010
Programming for the Cloud (2)
• jClouds– http://code.google.com/p/jclouds/
– Cloud interoperability framework (AWS, Google AppEngine*, Windows Azure, GoGrid)
– Mostly storage oriented functionality
• Eucalyptus– http://www.eucalyptus.com/
– Open source private cloud infrastructure
– AWS compatible (EC2, EBS, S3)
– Cross-hypervisor support
Cloud Computing #26
(C) Eucalyptus Inc.
Apr 2010
Don’t forget…
• Deploying on EC2 requires minimal to no modifications of existing software
• EC2 has some big machines: 70GB RAM / 8 CPU cores
• 1,000 servers for 1hr cost the same as 1 server for 1,000hrs
• Data traffic (in/out) of the Cloud can be expensive
• Storage relatively cheap
• Internal cloud traffic is free (AWS), e.g. accessing other applications/datasets on the Cloud
• CPU price: uptime (EC2) vs. computing cycles (AppEngine)
• EC2 spot instances (off-peak hours) are very, very cheap!
Cloud Computing #27
Apr 2010
Semantic Web on the Cloud
• Public Data Sets on AWS– A lot of datasets hosted for free by Amazon
• Freebase, UniGene, US Census, …
– New data sets can be submitted too (after approval)
– Full LOD cloud still not available (due to licensing issues)
• SaaS– Virtuoso (AWS hosted), OpenCalais, …
• “Semantic Cloud” initiatives (cloud interoperability & data integration)– E.g. fluidOps - Management & provisioning of semantic applications
(SaaS) and datasources (DaaS) on the Cloud
• Semantic Web apps as virtual appliances on the Cloud
• LOD data sources as virtual resources on the Cloud (“Self-service” paradigm)
Cloud Computing #29
Apr 2010
Unified Cloud Computing
• http://code.google.com/p/unifiedcloud/
• Uses RDF for cloud data interoperability
Cloud Computing #30
Apr 2010
Useful and useless links
• http://groups.google.com/group/cloud-computing
• “An Essential Guide to Possibilities and Risks of Cloud Computing”
• “Talking To Your CFO About Cloud Computing”
• Nick Carr @ Atmosphere’2009
• Introducing the Windows Azure platform
Cloud Computing #31