Hands On: Multimedia Methods for Large Scale Video ...fractor/fall2012/cs294-4-2012.pdf · – Bid...

Preview:

Citation preview

Hands On: Multimedia Methods for Large Scale Video Analysis(Project Meeting)

Dr. Gerald Friedland, fractor@icsi.berkeley.edu

1

Today

Today

• Amazon EC2

Today

• Amazon EC2– What is it?

Today

• Amazon EC2– What is it?– Concepts to understand before using

it

Today

• Amazon EC2– What is it?– Concepts to understand before using

it– Some tutorials

Today

• Amazon EC2– What is it?– Concepts to understand before using

it– Some tutorials– Issues: Data, cost, and other

considerations

Today

• Amazon EC2– What is it?– Concepts to understand before using

it– Some tutorials– Issues: Data, cost, and other

considerations• More on Project Ideas

3

Amazon EC2

4

Amazon EC2

• EC2 = Elastic Compute Cluster

4

Amazon EC2

• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines

(=Instance) running on real machines

4

Amazon EC2

• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines

(=Instance) running on real machines– Storage is virtualized as well

4

Amazon EC2

• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines

(=Instance) running on real machines– Storage is virtualized as well

• Originally designed for scalable web shops (like Amazon.com) - Software as a Service (SaaS)

4

Amazon EC2

• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines

(=Instance) running on real machines– Storage is virtualized as well

• Originally designed for scalable web shops (like Amazon.com) - Software as a Service (SaaS)

• Now: IaaS (Infrastructure as a Service)4

Infrastructure on Demand

5

Infrastructure on Demand

5

• Hardware On Demand

Infrastructure on Demand

5

• Hardware On Demand • Pay for what you use

Infrastructure on Demand

5

• Hardware On Demand • Pay for what you use • Full root access – you control the

OS and Software Stack

Infrastructure on Demand

5

• Hardware On Demand • Pay for what you use • Full root access – you control the

OS and Software Stack • Ability to scale computing

resources up and down

Infrastructure on Demand

5

• Hardware On Demand • Pay for what you use • Full root access – you control the

OS and Software Stack • Ability to scale computing

resources up and down • No dealing with racks, networks,

power, cooling, housing, etc.

Amazon EC2

6

Amazon EC2• Resizable Compute is controlled

via Instances either through

6

Amazon EC2• Resizable Compute is controlled

via Instances either through – Web Interface (Amazon Web Services)

6

Amazon EC2• Resizable Compute is controlled

via Instances either through – Web Interface (Amazon Web Services) – API

6

Amazon EC2• Resizable Compute is controlled

via Instances either through – Web Interface (Amazon Web Services) – API

• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.

6

Amazon EC2• Resizable Compute is controlled

via Instances either through – Web Interface (Amazon Web Services) – API

• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.

• Wide Variety of Pre-built AMIs (Amazon Machine Images)

6

Amazon EC2• Resizable Compute is controlled

via Instances either through – Web Interface (Amazon Web Services) – API

• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.

• Wide Variety of Pre-built AMIs (Amazon Machine Images)

• Access for each running instance using keypair required for SSH access. 6

7

EC2: Instance Examples

EC2: Special Instances

8

EC2: Special Instances• “Spot” Instances

8

EC2: Special Instances• “Spot” Instances

– Bid for unused AWS capacity

8

EC2: Special Instances• “Spot” Instances

– Bid for unused AWS capacity – Prices controlled by AWS based on

supply and demand

8

EC2: Special Instances• “Spot” Instances

– Bid for unused AWS capacity – Prices controlled by AWS based on

supply and demand • AWS can terminate Spot Instances

without notice

8

EC2: Special Instances• “Spot” Instances

– Bid for unused AWS capacity – Prices controlled by AWS based on

supply and demand • AWS can terminate Spot Instances

without notice • Best approach to temporary

requests for large numbers of servers

8

EC2: Special Instances• “Spot” Instances

– Bid for unused AWS capacity – Prices controlled by AWS based on

supply and demand • AWS can terminate Spot Instances

without notice • Best approach to temporary

requests for large numbers of servers

• Default maximum = 100 servers (instead of 20 on-demand) 8

EC2: More Concepts

9

EC2: More Concepts

9

• Regions: A region is a geographical area that contains one or more Availability ZonesData transfer: Cheap(er)

EC2: More Concepts

9

• Regions: A region is a geographical area that contains one or more Availability ZonesData transfer: Cheap(er)

• Availability Zone: Some services only available in the same AZ.Data transfer: Free

EC2: Even More Concepts

10

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance.

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance. • Raw, unformatted, block device.

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive.

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure

Rate) between 0.1% and 1%.

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure

Rate) between 0.1% and 1%. • Sizes range from 1 GB to 1 TB.

EC2: Even More Concepts

10

• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be

loaded onto one or more virtual machines

• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular

EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure

Rate) between 0.1% and 1%. • Sizes range from 1 GB to 1 TB.

• Easy to create, attach, back up, restore, and delete volumes.

EC2: More on Storage

11

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.

• Number of objects you can store is unlimited.

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.

• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a

unique, user-assigned key

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.

• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a

unique, user-assigned key• Different levels of reliability.

EC2: More on Storage

11

• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used

to store and retrieve any amount of data, at any time, from anywhere on the web”

• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.

• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a

unique, user-assigned key• Different levels of reliability. • Generally cheaper than EBS

EBS vs S3

12

EBS vs S3

12

• EBS only mountable to one instance

EBS vs S3

12

• EBS only mountable to one instance

• EBS can only be used with instances in same AZ

EBS vs S3

12

• EBS only mountable to one instance

• EBS can only be used with instances in same AZ

• EBS <-> S3 can be converted,but $$$

Available Datasets

Available Datasets

• Amazon makes Public Datasets available as EBS

Available Datasets

• Amazon makes Public Datasets available as EBS

• 1M Song Dataset and 10k subset is available.

Available Datasets

• Amazon makes Public Datasets available as EBS

• 1M Song Dataset and 10k subset is available.

• More info on: http://aws.amazon.com/publicdatasets/

Using EC2 for HPC: Intro

http://www.youtube.com/embed/YfCgK1bmCjw

Using EC2 for HPC: Intro

http://www.youtube.com/embed/YfCgK1bmCjwWatch video at:

Problem

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation •AMI creation

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation •AMI creation •AWS / SSH key management and distribution

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing •Configuration management

Problem

EC2 provides raw compute power. There’s work to be done to create a usable cluster:

•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing •Configuration management •Higher-level management (cluster vs. instance)

Alternative: Elastic MapReduce

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

• Workflows defined either on console or using Webinterface

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

• Workflows defined either on console or using Webinterface

• Map/Reduce code in S3

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

• Workflows defined either on console or using Webinterface

• Map/Reduce code in S3• Input/Output data stored in S3

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

• Workflows defined either on console or using Webinterface

• Map/Reduce code in S3• Input/Output data stored in S3• MapReduce covered in future lecture

Alternative: Elastic MapReduce

• Launch MapReduce jobs on EC2 using Hadoop

• Workflows defined either on console or using Webinterface

• Map/Reduce code in S3• Input/Output data stored in S3• MapReduce covered in future lecture• More info on Amazon:

http://aws.amazon.com/elasticmapreduce/

Alternative: MIT Starcluster

Alternative: MIT Starcluster

Watch video at:

Alternative: MIT Starcluster

Watch video at: http://www.youtube.com/watch?v=vC3lJcPq1FY

EC2 Cost: Free Services

• 750 hours of EC2 running Micro instance usage• 30 GB of Amazon EBS Standard volume storage

plus 2 million IOs and 1 GB snapshot storage• 15 GB of bandwidth out aggregated across all AWS

services• 1 GB of Regional Data Transfer

Data Transfer Cost: Between Instances

Data Transfer Cost: Between Instances

• Inside AZ: Free

Data Transfer Cost: Between Instances

• Inside AZ: Free• Inside Region: $0.01/GB

Data Transfer Cost: Between Instances

• Inside AZ: Free• Inside Region: $0.01/GB• Public IP (even in AZ): $0.01/GB

Data Storage Cost

CPU Cost

The Tradeoffs

The Tradeoffs

• More CPUs vs more power per CPU

The Tradeoffs

• More CPUs vs more power per CPU• What data to transfer, what data to

process locally

The Tradeoffs

• More CPUs vs more power per CPU• What data to transfer, what data to

process locally• Self-configured vs automatic

(Starcluster) vs predefined (MapReduce)

The Tradeoffs

• More CPUs vs more power per CPU• What data to transfer, what data to

process locally• Self-configured vs automatic

(Starcluster) vs predefined (MapReduce)

• Console vs Web Interface vs API

The Tradeoffs

• More CPUs vs more power per CPU• What data to transfer, what data to

process locally• Self-configured vs automatic

(Starcluster) vs predefined (MapReduce)

• Console vs Web Interface vs API• S3 vs EBS

The Tradeoffs

• More CPUs vs more power per CPU• What data to transfer, what data to

process locally• Self-configured vs automatic

(Starcluster) vs predefined (MapReduce)

• Console vs Web Interface vs API• S3 vs EBS• Amazon vs. stay local (ICSI)

More on Project Ideas

This Week (Lecture)

24

• More on Audio

Next Week (Project Meeting)

25

• Project Teams

Recommended