Expanding the Horizons of Cloud Computing Beyond the Data ...€¦ · Expanding the Horizons of Cloud Computing Beyond the Data Center Jon Weissman and Abhishek Chandra (+ great UMn

Expanding the Horizons of Cloud Computing Beyond the Data Center

Jon Weissman and Abhishek Chandra

(+ great UMn students)

Department of CS&E

University of Minnesota

Key Trends

• Client technology

– devices: smart phones, ipods, tablets, sensors

• Big data

– located at the network edge

• Privacy/trust

– local clouds

• Multiple DCs/clouds

– global services

Minnesota Cloud Research

• Eye towards cloud evolution

• Projects

– Nebula

– DMapReduce

– Mobile cloud

– Proxy cloud

– Active cloud storage

– Virtualization

Project Clusters

• Power at the edge

– big data, locality

• Cloud-2-Cloud

– multiple clouds/DCs

• Mobile user

– user-centric cloud

Nebula

Mobile cloud

DMapReduce

Big Data Trend

• Big data is distributed

– earth science: weather data, seismic data

– life science: GenBank, NCI BLAST, PubMed

– health science: GoogleEarth + CDC pandemic data

– web 2.0: user multimedia blogs

– “everyone is a sensor”

• Cost in moving data to the cloud

Big Data Trend: Nebulas

• Process data “close by”

– fully and/or on-route to the central cloud

– cost: save time and money

– privacy (think: patient records)

• Close by

– network distance

– trusted peers

Example: Dispersed-Data-Intensive Services

Data is geographically distributed Costly, inefficient to move to central location

Example Instance: Blog Analysis

blog1 blog2

blog3

Nebula: A New Cloud Model

• Make the cloud more “distributed”

– exploit the rich collection of edge computers

– volunteers (P2P, @home), commercial (CDNs)

– enormous computing potential, network diversity

– lower latency: “on demand”, native client sandboxing

Nebula Central

Example: Blog Analysis

blog1 blog2

blog3

Blog Results

0

20000

40000

60000

80000

100000

120000

140000

40 80 120 160 240 320

Tim

e t

ake

n (

sec)

Amazon emulator Nebula testbed

# Blogs

Current Status

• Prototype running on PlanetLab

• Distributed Data-Store Service

• Network Dashboard Service

• Node software stack

– native client

Nebula Going Forward

• Organize Nebulas

– around trusted peers

– social groups

– communities of interest

– local resources

• Nebula + commercial cloud

“use the edge opportunistically”

Big Data Trend: DMapReduce

14

Input

Data

Data Push

Output

Data

Map Reduce

Big Data Trend: Distribution • Big data is distributed

– earth science: weather data, seismic data

– life science: GenBank, NCI BLAST, PubMed

– health science: GoogleEarth + CDC pandemic data

– web 2.0: user multimedia blogs

Wide-Area MapReduce

DFS push

• Data in different data-centers

• Run MapReduce across them

• Data-flow spanning wide-area networks

17

Option 1: Local MapReduce

Data

Source

(US)

Data

Source

(EU)

Data Center (US) Data Center (EU)

MapReduce

Job

Final Result

Data Push

(Fast)

Data Push

(Slow)

18

Option 2: Global MapReduce

Data

Source

(US)

Data

Source

(EU)


MapReduce Job Final

Result

Data Push

(Fast)

Data Push

(Slow)

Data Push

(Fast)

Data Push

(Slow)

19

Option 3: Distributed MapReduce

Data

Source

(US)

Data

Source

(EU)


MapReduce

Job

Final Result

Data Push

(Fast)

MapReduce

Job

Combine Results

Data Push

(Fast)

PlanetLab: WordCount (text data)

• Distributed MR works best in presence of data aggregation

20

0

200

400

600

800

1000

1200

1400

1600

1800

Push US Push EU Map Reduce ResultCombine

Total

Tim

e in s

eco

nds

Local MR

Global MR

Distributed MR

PlanetLab: WordCount (random data)

• Local MR works best in presence of data ballooning

21

0

100

200

300

400

500

600

700

800

900


Total

Tim

e in s

eco

nds

Local MR

Global MR

Distributed MR

EC-2 Results

22

0100200300400500600700800900

1000


Total

Tim

e in s

eco

nds Local MR

Distributed MR

0

100

200

300

400

500

600

700

800

900


Total

Tim

e in S

eco

nds Local MR

Distributed MR

0

50

100

150

200

250

300


Total

Tim

e in S

eco

nds

Local MR

Distributed MR

WordCount (Text) WordCount (Random)

Sort

Intelligent Data Placement

• HDFS push

– local node, same rack, random rack

Data placement

Scheduling

Resource Topology Application

Characteristics

/DCi/rackA/nodeX Data expansion factors input->intermediate, a

Intermediate->output, b

select LMR, DMR, GMR

Generalize beyond HDFS/MapReduce!

Mobility Trend: Mobile Cloud

• Mobile users/applications: phones, tablets

– resource limited: power, CPU, memory

– applications are becoming sophisticated

• Improve mobile user experience

– performance, reliability, fidelity

– tap into the cloud dynamically based on current resource state, preferences, interests, privacy

Cloud Mobile Opportunity

• Dynamic outsourcing

– move computation, data to the cloud dynamically

• User context

– exploit user behavior to pre-fetch, pre-compute, cache

• Multi-user sharing

– discover implicit cloud sharing based on interests, social ties

Outsourcing

• Partitioning across (Mobile <->Server <-> Cloud)

– local data capture + cloud processing

– images/video, speech, digital design, aug. reality

Server Server Server Server Server

Proxy Code repository

….

….

Application Profiler

Outsourcing Client

Outsourcing Controller

cloud end (EC-2) mobile end (Android)

Experimental Results -Image Sharpening

• Response time

– both WIFI & 3G

– up to 27× speedup

– 219K, WIFI

• Power consumption

– save up to 9× times

– 219K, WIFI

27

Avg. Time

Avg. Power

Cloud Speculation

• Dynamic user profile

– contains activities in time and space

– “read nytimes.com bet. 9-9:30am on the train; likes technology articles”

– associate with context

• Patterns are relationships between activities

– repetitive, sequential, concurrent, time-bounded

– “user always does X and then does Y”

• Exploiting patterns: pre-fetching, pre-computing, caching in the cloud

Smart cloud

Back-end speculation and optimization

Summary

• Cloud Evolution

– mobile users, big data, privacy/trust, global

• Our vision of the Cloud

– locality of users, data, other clouds/data centers, user-centric behavior

Documents

Expanding the Horizons of Cloud Computing Beyond the Data ...€¦ · Expanding the Horizons of Cloud Computing Beyond the Data Center Jon Weissman and Abhishek Chandra (+ great UMn