44
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday, September 3, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

Cloud computing

  • Upload
    vizz

  • View
    433

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Cloud computing

Cloud Computing Lecture #1

What is Cloud Computing?(and an intro to parallel/distributed processing)

Jimmy LinThe iSchoolUniversity of Maryland

Wednesday, September 3, 2008

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details

Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

Page 2: Cloud computing

Source: http://www.free-pictures-photos.com/

Page 3: Cloud computing

The iSchoolUniversity of Maryland

What is Cloud Computing?

1. Web-scale problems

2. Large data centers

3. Different models of computing

4. Highly-interactive Web applications

Page 4: Cloud computing

The iSchoolUniversity of Maryland

1. Web-Scale Problems Characteristics:

Definitely data-intensive May also be processing intensive

Examples: Crawling, indexing, searching, mining the Web “Post-genomics” life sciences research Other scientific data (physics, astronomers, etc.) Sensor networks Web 2.0 applications …

Page 5: Cloud computing

The iSchoolUniversity of Maryland

How much data? Wayback Machine has 2 PB + 20 TB/month (2006)

Google processes 20 PB a day (2008)

“all words ever spoken by human beings” ~ 5 EB

NOAA has ~1 PB climate data (2007)

CERN’s LHC will generate 15 PB a year (2008)

640K ought to be enough for anybody.

Page 6: Cloud computing

Maximilien Brice, © CERN

Page 7: Cloud computing

Maximilien Brice, © CERN

Page 8: Cloud computing

The iSchoolUniversity of Maryland

There’s nothing like more data!

s/inspiration/data/g;

(Banko and Brill, ACL 2001)(Brants et al., EMNLP 2007)

Page 9: Cloud computing

The iSchoolUniversity of Maryland

What to do with more data? Answering factoid questions

Pattern matching on the Web Works amazingly well

Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances

Who shot Abraham Lincoln? X shot Abraham Lincoln

Birthday-of(Mozart, 1756)Birthday-of(Einstein, 1879)

Wolfgang Amadeus Mozart (1756 - 1791)Einstein was born in 1879

PERSON (DATE –PERSON was born in DATE

(Brill et al., TREC 2001; Lin, ACM TOIS 2007)(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )

Page 10: Cloud computing

The iSchoolUniversity of Maryland

2. Large Data Centers Web-scale problems? Throw more machines at it!

Clear trend: centralization of computing resources in large data centers Necessary ingredients: fiber, juice, and space What do Oregon, Iceland, and abandoned mines have in

common?

Important Issues: Redundancy Efficiency Utilization Management

Page 11: Cloud computing

Source: Harper’s (Feb, 2008)

Page 12: Cloud computing

Maximilien Brice, © CERN

Page 13: Cloud computing

The iSchoolUniversity of Maryland

Key Technology: Virtualization

Hardware

Operating System

App App App

Traditional Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Stack

Page 14: Cloud computing

The iSchoolUniversity of Maryland

3. Different Computing Models

Utility computing Why buy machines when you can rent cycles? Examples: Amazon’s EC2, GoGrid, AppNexus

Platform as a Service (PaaS) Give me nice API and take care of the implementation Example: Google App Engine

Software as a Service (SaaS) Just run it for me! Example: Gmail

“Why do it yourself if you can pay someone to do it for you?”

Page 15: Cloud computing

The iSchoolUniversity of Maryland

4. Web Applications A mistake on top of a hack built on sand held together by

duct tape?

What is the nature of software applications? From the desktop to the browser SaaS == Web-based applications Examples: Google Maps, Facebook

How do we deliver highly-interactive Web-based applications? AJAX (asynchronous JavaScript and XML) For better, or for worse…

Page 16: Cloud computing

The iSchoolUniversity of Maryland

What is the course about? MapReduce: the “back-end” of cloud computing

Batch-oriented processing of large datasets

Ajax: the “front-end” of cloud computing Highly-interactive Web-based applications

Computing “in the clouds” Amazon’s EC2/S3 as an example of utility computing

Page 17: Cloud computing

The iSchoolUniversity of Maryland

Amazon Web Services Elastic Compute Cloud (EC2)

Rent computing resources by the hour Basic unit of accounting = instance-hour Additional costs for bandwidth

Simple Storage Service (S3) Persistent storage Charge by the GB/month Additional costs for bandwidth

You’ll be using EC2/S3 for course assignments!

Page 18: Cloud computing

The iSchoolUniversity of Maryland

This course is not for you… If you’re not genuinely interested in the topic

If you’re not ready to do a lot of programming

If you’re not open to thinking about computing in new ways

If you can’t cope with uncertainly, unpredictability, poor documentation, and immature software

If you can’t put in the time

Otherwise, this will be a richly rewarding course!

Page 19: Cloud computing

Source: http://davidzinger.wordpress.com/2007/05/page/2/

Page 20: Cloud computing

The iSchoolUniversity of Maryland

Cloud Computing Zen Don’t get frustrated (take a deep breath)…

This is bleeding edge technology Those W$*#T@F! moments

Be patient… This is the second first time I’ve taught this course

Be flexible… There will be unanticipated issues along the way

Be constructive… Tell me how I can make everyone’s experience better

Page 21: Cloud computing

Source: Wikipedia

Page 22: Cloud computing

Source: Wikipedia

Page 23: Cloud computing

Source: Wikipedia

Page 24: Cloud computing

Source: Wikipedia

Page 25: Cloud computing

The iSchoolUniversity of Maryland

Things to go over… Course schedule

Assignments and deliverables

Amazon EC2/S3

Page 26: Cloud computing

The iSchoolUniversity of Maryland

Web-Scale Problems? Don’t hold your breath:

Biocomputing Nanocomputing Quantum computing …

It all boils down to… Divide-and-conquer Throwing more hardware at the problem

Simple to understand… a lifetime to master…

Page 27: Cloud computing

The iSchoolUniversity of Maryland

Divide and Conquer

“Work”

w1 w2 w3

r1 r2 r3

“Result”

“worker” “worker” “worker”

Partition

Combine

Page 28: Cloud computing

The iSchoolUniversity of Maryland

Different Workers Different threads in the same core

Different cores in the same CPU

Different CPUs in a multi-processor system

Different machines in a distributed system

Page 29: Cloud computing

The iSchoolUniversity of Maryland

Choices, Choices, Choices Commodity vs. “exotic” hardware

Number of machines vs. processor vs. cores

Bandwidth of memory vs. disk vs. network

Different programming models

Page 30: Cloud computing

The iSchoolUniversity of Maryland

Flynn’s Taxonomy

Instructions

Single (SI) Multiple (MI)

Da

ta

Mu

ltip

le (

MD

)

SISD

Single-threaded process

MISD

Pipeline architecture

SIMD

Vector Processing

MIMD

Multi-threaded Programming

Sin

gle

(S

D)

Page 31: Cloud computing

The iSchoolUniversity of Maryland

SISD

D D D D D D D

Processor

Instructions

Page 32: Cloud computing

The iSchoolUniversity of Maryland

SIMD

D0

Processor

Instructions

D0D0 D0 D0 D0

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D1

D2

D3

D4

Dn

D0

Page 33: Cloud computing

The iSchoolUniversity of Maryland

MIMD

D D D D D D D

Processor

Instructions

D D D D D D D

Processor

Instructions

Page 34: Cloud computing

The iSchoolUniversity of Maryland

Memory Typology: Shared

Memory

Processor

Processor Processor

Processor

Page 35: Cloud computing

The iSchoolUniversity of Maryland

Memory Typology: Distributed

MemoryProcessor MemoryProcessor

MemoryProcessor MemoryProcessor

Network

Page 36: Cloud computing

The iSchoolUniversity of Maryland

Memory Typology: Hybrid

MemoryProcessor

Network

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

MemoryProcessor

Processor

Page 37: Cloud computing

The iSchoolUniversity of Maryland

Parallelization Problems How do we assign work units to workers?

What if we have more work units than workers?

What if workers need to share partial results?

How do we aggregate partial results?

How do we know all the workers have finished?

What if workers die?

What is the common theme of all of these problems?

Page 38: Cloud computing

The iSchoolUniversity of Maryland

General Theme? Parallelization problems arise from:

Communication between workers Access to shared resources (e.g., data)

Thus, we need a synchronization system!

This is tricky: Finding bugs is hard Solving bugs is even harder

Page 39: Cloud computing

The iSchoolUniversity of Maryland

Managing Multiple Workers Difficult because

(Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other

Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers

Still, lots of problems: Deadlock, livelock, race conditions, ...

Moral of the story: be careful! Even trickier if the workers are on different machines

Page 40: Cloud computing

The iSchoolUniversity of Maryland

Patterns for Parallelism Parallel computing has been around for decades

Here are some “design patterns” …

Page 41: Cloud computing

The iSchoolUniversity of Maryland

Master/Slaves

slaves

master

Page 42: Cloud computing

The iSchoolUniversity of Maryland

Producer/Consumer Flow

CP

P

P

C

C

CP

P

P

C

C

Page 43: Cloud computing

The iSchoolUniversity of Maryland

Work Queues

CP

P

P

C

C

shared queue

W W W W W

Page 44: Cloud computing

The iSchoolUniversity of Maryland

Rubber Meets Road From patterns to implementation:

pthreads, OpenMP for multi-threaded programming MPI for clustering computing …

The reality: Lots of one-off solutions, custom code Write you own dedicated library, then program with it Burden on the programmer to explicitly manage everything

MapReduce to the rescue! (for next time)