Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Hands On: Multimedia Methods for Large Scale Video Analysis(Project Meeting)
Dr. Gerald Friedland, [email protected]
1
Today
Today
• Amazon EC2
Today
• Amazon EC2– What is it?
Today
• Amazon EC2– What is it?– Concepts to understand before using
it
Today
• Amazon EC2– What is it?– Concepts to understand before using
it– Some tutorials
Today
• Amazon EC2– What is it?– Concepts to understand before using
it– Some tutorials– Issues: Data, cost, and other
considerations
Today
• Amazon EC2– What is it?– Concepts to understand before using
it– Some tutorials– Issues: Data, cost, and other
considerations• More on Project Ideas
3
Amazon EC2
4
Amazon EC2
• EC2 = Elastic Compute Cluster
4
Amazon EC2
• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines
(=Instance) running on real machines
4
Amazon EC2
• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines
(=Instance) running on real machines– Storage is virtualized as well
4
Amazon EC2
• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines
(=Instance) running on real machines– Storage is virtualized as well
• Originally designed for scalable web shops (like Amazon.com) - Software as a Service (SaaS)
4
Amazon EC2
• EC2 = Elastic Compute Cluster– Configurable set of Virtual Machines
(=Instance) running on real machines– Storage is virtualized as well
• Originally designed for scalable web shops (like Amazon.com) - Software as a Service (SaaS)
• Now: IaaS (Infrastructure as a Service)4
Infrastructure on Demand
5
Infrastructure on Demand
5
• Hardware On Demand
Infrastructure on Demand
5
• Hardware On Demand • Pay for what you use
Infrastructure on Demand
5
• Hardware On Demand • Pay for what you use • Full root access – you control the
OS and Software Stack
Infrastructure on Demand
5
• Hardware On Demand • Pay for what you use • Full root access – you control the
OS and Software Stack • Ability to scale computing
resources up and down
Infrastructure on Demand
5
• Hardware On Demand • Pay for what you use • Full root access – you control the
OS and Software Stack • Ability to scale computing
resources up and down • No dealing with racks, networks,
power, cooling, housing, etc.
Amazon EC2
6
Amazon EC2• Resizable Compute is controlled
via Instances either through
6
Amazon EC2• Resizable Compute is controlled
via Instances either through – Web Interface (Amazon Web Services)
6
Amazon EC2• Resizable Compute is controlled
via Instances either through – Web Interface (Amazon Web Services) – API
6
Amazon EC2• Resizable Compute is controlled
via Instances either through – Web Interface (Amazon Web Services) – API
• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.
6
Amazon EC2• Resizable Compute is controlled
via Instances either through – Web Interface (Amazon Web Services) – API
• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.
• Wide Variety of Pre-built AMIs (Amazon Machine Images)
6
Amazon EC2• Resizable Compute is controlled
via Instances either through – Web Interface (Amazon Web Services) – API
• Variety of Instance Sizes: CPU Power, Cores, RAM, Disk.
• Wide Variety of Pre-built AMIs (Amazon Machine Images)
• Access for each running instance using keypair required for SSH access. 6
7
EC2: Instance Examples
EC2: Special Instances
8
EC2: Special Instances• “Spot” Instances
8
EC2: Special Instances• “Spot” Instances
– Bid for unused AWS capacity
8
EC2: Special Instances• “Spot” Instances
– Bid for unused AWS capacity – Prices controlled by AWS based on
supply and demand
8
EC2: Special Instances• “Spot” Instances
– Bid for unused AWS capacity – Prices controlled by AWS based on
supply and demand • AWS can terminate Spot Instances
without notice
8
EC2: Special Instances• “Spot” Instances
– Bid for unused AWS capacity – Prices controlled by AWS based on
supply and demand • AWS can terminate Spot Instances
without notice • Best approach to temporary
requests for large numbers of servers
8
EC2: Special Instances• “Spot” Instances
– Bid for unused AWS capacity – Prices controlled by AWS based on
supply and demand • AWS can terminate Spot Instances
without notice • Best approach to temporary
requests for large numbers of servers
• Default maximum = 100 servers (instead of 20 on-demand) 8
EC2: More Concepts
9
EC2: More Concepts
9
• Regions: A region is a geographical area that contains one or more Availability ZonesData transfer: Cheap(er)
EC2: More Concepts
9
• Regions: A region is a geographical area that contains one or more Availability ZonesData transfer: Cheap(er)
• Availability Zone: Some services only available in the same AZ.Data transfer: Free
EC2: Even More Concepts
10
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance.
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance. • Raw, unformatted, block device.
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive.
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure
Rate) between 0.1% and 1%.
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure
Rate) between 0.1% and 1%. • Sizes range from 1 GB to 1 TB.
EC2: Even More Concepts
10
• Amazon Machine Image (AMI)• Contains an entire operating system and software stack that can be
loaded onto one or more virtual machines
• Amazon Elastic Block Storage (EBS)• Persistent storage: Volume lifetime is independent of any particular
EC2 instance. • Raw, unformatted, block device.• Performance equal to or better than local EC2 drive. • Built-in redundancy within availability zone. AFR (Annual Failure
Rate) between 0.1% and 1%. • Sizes range from 1 GB to 1 TB.
• Easy to create, attach, back up, restore, and delete volumes.
EC2: More on Storage
11
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.
• Number of objects you can store is unlimited.
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.
• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a
unique, user-assigned key
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.
• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a
unique, user-assigned key• Different levels of reliability.
EC2: More on Storage
11
• Amazon Simple Storage Solution (S3)• “... a simple web service interface that can be used
to store and retrieve any amount of data, at any time, from anywhere on the web”
• Read, write, and delete binary obejcts containing from 1 byte to 5 TB of data each using API.
• Number of objects you can store is unlimited. • Each object stored in a 'bucket'and retrieved via a
unique, user-assigned key• Different levels of reliability. • Generally cheaper than EBS
EBS vs S3
12
EBS vs S3
12
• EBS only mountable to one instance
EBS vs S3
12
• EBS only mountable to one instance
• EBS can only be used with instances in same AZ
EBS vs S3
12
• EBS only mountable to one instance
• EBS can only be used with instances in same AZ
• EBS <-> S3 can be converted,but $$$
Available Datasets
Available Datasets
• Amazon makes Public Datasets available as EBS
Available Datasets
• Amazon makes Public Datasets available as EBS
• 1M Song Dataset and 10k subset is available.
Available Datasets
• Amazon makes Public Datasets available as EBS
• 1M Song Dataset and 10k subset is available.
• More info on: http://aws.amazon.com/publicdatasets/
Using EC2 for HPC: Intro
http://www.youtube.com/embed/YfCgK1bmCjw
Using EC2 for HPC: Intro
http://www.youtube.com/embed/YfCgK1bmCjwWatch video at:
Problem
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation •AMI creation
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation •AMI creation •AWS / SSH key management and distribution
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing •Configuration management
Problem
EC2 provides raw compute power. There’s work to be done to create a usable cluster:
•Software installation •AMI creation •AWS / SSH key management and distribution •Persistent Disk Storage and File Sharing •Configuration management •Higher-level management (cluster vs. instance)
Alternative: Elastic MapReduce
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
• Workflows defined either on console or using Webinterface
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
• Workflows defined either on console or using Webinterface
• Map/Reduce code in S3
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
• Workflows defined either on console or using Webinterface
• Map/Reduce code in S3• Input/Output data stored in S3
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
• Workflows defined either on console or using Webinterface
• Map/Reduce code in S3• Input/Output data stored in S3• MapReduce covered in future lecture
Alternative: Elastic MapReduce
• Launch MapReduce jobs on EC2 using Hadoop
• Workflows defined either on console or using Webinterface
• Map/Reduce code in S3• Input/Output data stored in S3• MapReduce covered in future lecture• More info on Amazon:
http://aws.amazon.com/elasticmapreduce/
Alternative: MIT Starcluster
Alternative: MIT Starcluster
Watch video at:
Alternative: MIT Starcluster
Watch video at: http://www.youtube.com/watch?v=vC3lJcPq1FY
EC2 Cost: Free Services
• 750 hours of EC2 running Micro instance usage• 30 GB of Amazon EBS Standard volume storage
plus 2 million IOs and 1 GB snapshot storage• 15 GB of bandwidth out aggregated across all AWS
services• 1 GB of Regional Data Transfer
Data Transfer Cost: Between Instances
Data Transfer Cost: Between Instances
• Inside AZ: Free
Data Transfer Cost: Between Instances
• Inside AZ: Free• Inside Region: $0.01/GB
Data Transfer Cost: Between Instances
• Inside AZ: Free• Inside Region: $0.01/GB• Public IP (even in AZ): $0.01/GB
Data Storage Cost
CPU Cost
The Tradeoffs
The Tradeoffs
• More CPUs vs more power per CPU
The Tradeoffs
• More CPUs vs more power per CPU• What data to transfer, what data to
process locally
The Tradeoffs
• More CPUs vs more power per CPU• What data to transfer, what data to
process locally• Self-configured vs automatic
(Starcluster) vs predefined (MapReduce)
The Tradeoffs
• More CPUs vs more power per CPU• What data to transfer, what data to
process locally• Self-configured vs automatic
(Starcluster) vs predefined (MapReduce)
• Console vs Web Interface vs API
The Tradeoffs
• More CPUs vs more power per CPU• What data to transfer, what data to
process locally• Self-configured vs automatic
(Starcluster) vs predefined (MapReduce)
• Console vs Web Interface vs API• S3 vs EBS
The Tradeoffs
• More CPUs vs more power per CPU• What data to transfer, what data to
process locally• Self-configured vs automatic
(Starcluster) vs predefined (MapReduce)
• Console vs Web Interface vs API• S3 vs EBS• Amazon vs. stay local (ICSI)
More on Project Ideas
More on Project Ideas
• ACM Multimedia Grand Challengehttp://www.acmmm12.org/call-for-multimedia-grand-challenge-solutions/
More on Project Ideas
• ACM Multimedia Grand Challengehttp://www.acmmm12.org/call-for-multimedia-grand-challenge-solutions/
• IEEE AASP Challenge:http://www.elec.qmul.ac.uk/digitalmusic/sceneseventschallenge/
More on Project Ideas
• ACM Multimedia Grand Challengehttp://www.acmmm12.org/call-for-multimedia-grand-challenge-solutions/
• IEEE AASP Challenge:http://www.elec.qmul.ac.uk/digitalmusic/sceneseventschallenge/
• ACM MM 2011 Proceedingshttp://dl.acm.org/citation.cfm?id=2072298
More on Project Ideas
• ACM Multimedia Grand Challengehttp://www.acmmm12.org/call-for-multimedia-grand-challenge-solutions/
• IEEE AASP Challenge:http://www.elec.qmul.ac.uk/digitalmusic/sceneseventschallenge/
• ACM MM 2011 Proceedingshttp://dl.acm.org/citation.cfm?id=2072298
• IEEE ICME 2012 Proceedingshttp://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5997811
This Week (Lecture)
24
• More on Audio
Next Week (Project Meeting)
25
• Project Teams