86
Cluster Computing with DryadLINQ Mihai Budiu Microsoft Research, Silicon Valley Cloudera, February 12, 2010

Cluster Computing with Dryad

  • Upload
    butest

  • View
    964

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Cluster Computing with Dryad

Cluster Computing with DryadLINQ

Mihai Budiu Microsoft Research, Silicon Valley

Cloudera, February 12, 2010

Page 2: Cluster Computing with Dryad

2

Goal

Page 3: Cluster Computing with Dryad

3

Design Space

ThroughputLatency

Internet

Privatedata

center

Data-parallel

Sharedmemory

DryadSearch

HPC

Grid

Transaction

Page 4: Cluster Computing with Dryad

Execution

Application

Data-Parallel Computation

4

Storage

Language

ParallelDatabases

Map-Reduce

GFSBigTable

CosmosAzure

SQL Server

Dryad

DryadLINQScope

Sawzall

Hadoop

HDFSS3

Pig, HiveSQL ≈SQL LINQ, SQLSawzall

Cosmos, HPC, Azure

Page 5: Cluster Computing with Dryad

5

SQL

Software Stack

Windows Server

Cosmos

Cosmos FS

Dryad

Distributed Shell

PSQL

DryadLINQSQL

server

Windows Server

Windows Server

Windows Server

C++

NTFS

legacycode

SSISScope

C#MachineLearning

.Net Distributed Data Structures

GraphsData

mining

Applications

Azure XCompute Windows HPC

Azure XStore SQL Server

Analytics

Tidy FS

Optimi-zation

Page 6: Cluster Computing with Dryad

6

• Introduction• Dryad • DryadLINQ• Building on DryadLINQ• Conclusions

Outline

Page 7: Cluster Computing with Dryad

7

Dryad

• Continuously deployed since 2006• Running on >> 104 machines• Sifting through > 10Pb data daily• Runs on clusters > 3000 machines• Handles jobs with > 105 processes each• Platform for rich software ecosystem• Used by >> 100 developers

• Written at Microsoft Research, Silicon Valley

Page 8: Cluster Computing with Dryad

8

Dryad = Execution Layer

Job (application)

Dryad

Cluster

Pipeline

Shell

Machine≈

Page 9: Cluster Computing with Dryad

9

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 10: Cluster Computing with Dryad

10

Virtualized 2-D Pipelines

Page 11: Cluster Computing with Dryad

11

Virtualized 2-D Pipelines

Page 12: Cluster Computing with Dryad

12

Virtualized 2-D Pipelines

Page 13: Cluster Computing with Dryad

13

Virtualized 2-D Pipelines

Page 14: Cluster Computing with Dryad

14

Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized

Page 15: Cluster Computing with Dryad

15

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

Page 16: Cluster Computing with Dryad

16

Channels

X

M

Items

Finite streams of items

• distributed filesystem files (persistent)• SMB/NTFS files (temporary)• TCP pipes (inter-machine)• memory FIFOs (intra-machine)

Page 17: Cluster Computing with Dryad

17

Dryad System Architecture

Files, TCP, FIFO, Networkjob schedule

data plane

control plane

NS,Sched PD PDPD

V V V

Job manager cluster

Page 18: Cluster Computing with Dryad

Fault Tolerance

Page 19: Cluster Computing with Dryad

19

Policy Managers

R R

X X X X

Stage RR R

Stage X

Job Manager

R managerX ManagerR-X

Manager

Connection R-X

Page 20: Cluster Computing with Dryad

X[0] X[1] X[3] X[2] X’[2]

Completed vertices Slow vertex

Duplicatevertex

Dynamic Graph Rewriting

Duplication Policy = f(running times, data volumes)

Page 21: Cluster Computing with Dryad

Cluster network topology

rack

top-of-rack switch

top-level switch

Page 22: Cluster Computing with Dryad

22

S S S S

A A A

S S

T

S S S S S S

T

# 1 # 2 # 1 # 3 # 3 # 2

# 3# 2# 1

static

dynamic

rack #

Dynamic Aggregation

Page 23: Cluster Computing with Dryad

23

Policy vs. Mechanism

• Application-level• Most complex in C++

code• Invoked with upcalls• Need good default

implementations• DryadLINQ provides

a comprehensive set

• Built-in• Scheduling• Graph rewriting• Fault tolerance• Statistics and

reporting

Page 24: Cluster Computing with Dryad

24

• Introduction• Dryad • DryadLINQ• Building on DryadLINQ• Conclusions

Outline

Page 25: Cluster Computing with Dryad

25

LINQ

Dryad

=> DryadLINQ

Page 26: Cluster Computing with Dryad

26

LINQ = .Net+ Queries

Collection<T> collection;bool IsLegal(Key);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

Page 27: Cluster Computing with Dryad

27

Collections and Iteratorsclass Collection<T> : IEnumerable<T>;

public interface IEnumerable<T> {IEnumerator<T> GetEnumerator();

}

public interface IEnumerator <T> {T Current { get; }bool MoveNext();void Reset();

}

Page 28: Cluster Computing with Dryad

28

DryadLINQ Data Model

Partition

Collection

.Net objects

Page 29: Cluster Computing with Dryad

29

Collection<T> collection;bool IsLegal(Key k);string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};

DryadLINQ = LINQ + Dryad

C#

collection

results

C# C# C#

Vertexcode

Queryplan(Dryad job)Data

Page 30: Cluster Computing with Dryad

30

Demo

Page 31: Cluster Computing with Dryad

31

Example: Histogrampublic static IQueryable<Pair> Histogram( IQueryable<LineRecord> input, int k){ var words = input.SelectMany(x => x.line.Split(' ')); var groups = words.GroupBy(x => x); var counts = groups.Select(x => new Pair(x.Key, x.Count())); var ordered = counts.OrderByDescending(x => x.count); var top = ordered.Take(k); return top;}

“A line of words of wisdom”

[“A”, “line”, “of”, “words”, “of”, “wisdom”]

[[“A”], [“line”], [“of”, “of”], [“words”], [“wisdom”]]

[ {“A”, 1}, {“line”, 1}, {“of”, 2}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}, {“words”, 1}, {“wisdom”, 1}]

[{“of”, 2}, {“A”, 1}, {“line”, 1}]

Page 32: Cluster Computing with Dryad

32

Histogram Plan

SelectManySort

GroupBy+SelectHashDistribute

MergeSortGroupBy

SelectSortTake

MergeSortTake

Page 33: Cluster Computing with Dryad

33

Map-Reduce in DryadLINQ

public static IQueryable<S> MapReduce<T,M,K,S>( this IQueryable<T> input,

Func<T, IEnumerable<M>> mapper,Func<M,K> keySelector,Func<IGrouping<K,M>,S> reducer)

{ var map = input.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.Select(reducer); return result;}

Page 34: Cluster Computing with Dryad

34

Map-Reduce Plan

M

R

G

M

Q

G1

R

D

MS

G2

R

static dynamic

X

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

X

M

Q

G1

R

D

MS

G2

R

MS

G2

R

map

sort

groupby

reduce

distribute

mergesort

groupby

reduce

mergesort

groupby

reduce

consumer

map

parti

al a

ggre

gatio

nre

duce

S S S S

A A A

S S

T

dynamic

Page 35: Cluster Computing with Dryad

35

Distributed Sorting Plan

O

DS

H

D

M

S

DS

H

D

M

S

DS

D

DS

H

D

M

S

DS

D

M

S

M

S

static dynamic dynamic

Page 36: Cluster Computing with Dryad

Expectation Maximization

36

• 160 lines • 3 iterations shown

Page 37: Cluster Computing with Dryad

37

Probabilistic Index MapsImages

features

Page 38: Cluster Computing with Dryad

38

Language Summary

WhereSelectGroupByOrderByAggregateJoinApplyMaterialize

Page 39: Cluster Computing with Dryad

39

LINQ System Architecture

Local machine

.Netprogram(C#, VB, F#, etc)

LINQProvider

Execution engine

Query

Objects

•LINQ-to-obj•PLINQ•LINQ-to-SQL•LINQ-to-WS•DryadLINQ•Flickr•Oracle•LINQ-to-XML•Your own

Page 40: Cluster Computing with Dryad

The DryadLINQ Provider

40

DryadLINQClient machine

(11)

Distributedquery plan

.Net

Query Expr

Data center

Output TablesResults

Input TablesInvoke Query

Output DryadTable

Dryad Execution

.Net Objects

Dryad JM

ToCollection

foreach

Vertexcode

Con-text

Page 41: Cluster Computing with Dryad

41

Combining Query Providers

PLINQ

Local machine

.Netprogram(C#, VB, F#, etc)

LINQProvider

Execution engines

Query

Objects

SQL Server

DryadLINQ

LINQProvider

LINQProvider

LINQProvider

LINQ-to-obj

Page 42: Cluster Computing with Dryad

42

Using PLINQQuery

DryadLINQ

PLINQ

Local query

Page 43: Cluster Computing with Dryad

43

LINQ to SQL

Using LINQ to SQL Server

Query

DryadLINQ

Query Query Query

Query Query

LINQ to SQL

Page 44: Cluster Computing with Dryad

44

Using LINQ-to-objects

Query

DryadLINQ

Local machine

Cluster

LINQ to obj

debug

production

Page 45: Cluster Computing with Dryad

45

• Introduction• Dryad • DryadLINQ• Building on/for DryadLINQ

– System monitoring with Artemis– Privacy-preserving query language (PINQ)– Machine learning

• Conclusions

Outline

Page 46: Cluster Computing with Dryad

46

Artemis: measuring clusters

CosmosCluster

HPCCluster

AzureCluster

Cluster/Job State API

DryadLINQ

Log collectionClusterbrowser/manager

Jobbrowser

Visualization

Statistics

DB

Plug-ins

Page 47: Cluster Computing with Dryad

47

DryadLINQ job browser

Page 48: Cluster Computing with Dryad

48

Automated diagnostics

Page 49: Cluster Computing with Dryad

49

Job statistics: schedule and critical path

Page 50: Cluster Computing with Dryad

50

Running time distribution

Page 51: Cluster Computing with Dryad

51

Performance counters

Page 52: Cluster Computing with Dryad

52

CPU Utilization

Page 53: Cluster Computing with Dryad

53

Load imbalance:rack assignment

Page 54: Cluster Computing with Dryad

54

PINQ

Privacy-sensitive database

Queries(LINQ)

Answer

Page 55: Cluster Computing with Dryad

55

PINQ = Privacy-Preserving LINQ• “Type-safety” for privacy• Provides interface to data that looks very much

like LINQ.• All access through the interface gives

differential privacy.• Analysts write arbitrary C# code against data

sets, like in LINQ.• No privacy expertise needed to produce

analyses.• Privacy currency is used to limit per-record

information released.

Page 56: Cluster Computing with Dryad

56

Example: search logs mining

Distribution of queries about “Cricket”

// Open sensitive data set with state-of-the-art securityPINQueryable<VisitRecord> visits = OpenSecretData(password);

// Group visits by patient and identify frequent patients.var patients = visits.GroupBy(x => x.Patient.SSN)

.Where(x => x.Count() > 5);

// Map each patient to their post code using their SSN.var locations = patients.Join(SSNtoPost, x => x.SSN, y => y.SSN, (x,y) => y.PostCode);

// Count post codes containing at least 10 frequent patients.var activity = locations.GroupBy(x => x)

.Where(x => x.Count() > 10);Visualize(activity); // Who knows what this does???

Page 57: Cluster Computing with Dryad

57

PINQ Download

• Implemented on top of DryadLINQ• Allows mining very sensitive datasets privately• Code is available• http://research.microsoft.com/en-us/projects/PINQ/• Frank McSherry, Privacy Integrated Queries,

SIGMOD 2009

Page 58: Cluster Computing with Dryad

58

Natal Training

Page 59: Cluster Computing with Dryad

59

Natal Problem

• Recognize players from depth map• At frame rate• Using 15% of one Xbox CPU core

Page 60: Cluster Computing with Dryad

60

Learn from Data

Motion Capture(ground truth)

Classifier

Training examplesMachine learning

Rasterize

Page 61: Cluster Computing with Dryad

61

Running on Xbox

Page 62: Cluster Computing with Dryad

62

Learning from data

Classifier

Training examples

Dryad

DryadLINQ

Machine learning

Page 63: Cluster Computing with Dryad

63

Large-Scale Machine Learning• > 1022 objects• Sparse, multi-dimensional data structures• Complex datatypes

(images, video, matrices, etc.)• Complex application logic and dataflow

– >35000 lines of .Net– 140 CPU days – > 105 processes– 30 TB data analyzed– 140 avg parallelism (235 machines)– 300% CPU utilization (4 cores/machine)

Page 64: Cluster Computing with Dryad

64

Highly efficient parallellization

Page 65: Cluster Computing with Dryad

65

• Introduction• Dryad • DryadLINQ• Building on DryadLINQ• Conclusions

Outline

Page 66: Cluster Computing with Dryad

66

Lessons Learned• Complete separation of

storage / execution / language• Using LINQ +.Net (language integration)• Static typing

– No protocol buffers (serialization code)• Allowing flexible and powerful policies• Centralized job manager: no replication, no

consensus, no checkpointing• Porting (HPC, Cosmos, Azure, SQL Server)

Page 67: Cluster Computing with Dryad

Conclusions

67

Visual Studio

LINQ

Dryad

67

=

Page 68: Cluster Computing with Dryad

68

“What’s the point if I can’t have it?”

• Dryad+DryadLINQ available for download– Academic license– Commercial evaluation license

• Runs on Windows HPC platform• Dryad is in binary form, DryadLINQ in source• Requires signing a 3-page licensing agreement• http://connect.microsoft.com/site/sitehome.aspx?SiteID=89

1

Page 69: Cluster Computing with Dryad

69

Backup Slides

Page 70: Cluster Computing with Dryad

70

What does DryadLINQ do?public struct Data { … public static int Compare(Data left, Data right);}

Data g = new Data();var result = table.Where(s => Data.Compare(s, g) < 0);

public static void Read(this DryadBinaryReader reader, out Data obj); public static int Write(this DryadBinaryWriter writer, Data obj);

public class DryadFactoryType__0 : LinqToDryad.DryadFactory<Data>

DryadVertexEnv denv = new DryadVertexEnv(args);var dwriter__2 = denv.MakeWriter(FactoryType__0);var dreader__3 = denv.MakeReader(FactoryType__0);var source__4 = DryadLinqVertex.Where(dreader__3,

s => (Data.Compare(s, ((Data)DryadLinqObjectStore.Get(0))) < ((System.Int32)(0))), false);

dwriter__2.WriteItemSequence(source__4);

Data serialization

Data factory

Channel writerChannel reader

LINQ code

Context serialization

Page 71: Cluster Computing with Dryad

71

Ongoing Dryad/DryadLINQ Research

• Performance modeling• Scheduling and resource allocation• Profiling and performance debugging• Incremental computation• Hardware acceleration• High-level programming abstractions• Many domain-specific applications

Page 72: Cluster Computing with Dryad

72

Sample applications written using DryadLINQ Class

Distributed linear algebra Numerical

Accelerated Page-Rank computation Web graph

Privacy-preserving query language Data mining

Expectation maximization for a mixture of Gaussians Clustering

K-means Clustering

Linear regression Statistics

Probabilistic Index Maps Image processing

Principal component analysis Data mining

Probabilistic Latent Semantic Indexing Data mining

Performance analysis and visualization Debugging

Road network shortest-path preprocessing Graph

Botnet detection Data mining

Epitome computation Image processing

Neural network training Statistics

Parallel machine learning framework infer.net Machine learning

Distributed query caching Optimization

Image indexing Image processing

Web indexing structure Web graph

Page 73: Cluster Computing with Dryad

JM code

vertex code

Staging1. Build

2. Send .exe

3. Start JM

5. Generate graph

7. Serializevertices

8. MonitorVertex execution

4. Querycluster resources

Cluster services6. Initialize vertices

Page 74: Cluster Computing with Dryad

74

BibliographyDryad: Distributed Data-Parallel Programs from Sequential Building BlocksMichael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis FetterlyEuropean Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level LanguageYuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Úlfar Erlingsson, Pradeep Kumar Gunda, and Jon CurreySymposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008

SCOPE: Easy and Efficient Parallel Processing of Massive Data SetsRonnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren ZhouVery Large Databases Conference (VLDB), Auckland, New Zealand, August 23-28 2008

Hunting for problems with ArtemisGabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises GoldszmidtUSENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008

DryadInc: Reusing work in large-scale computationsLucian Popa, Mihai Budiu, Yuan Yu, and Michael IsardWorkshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 15, 2009

Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations, Yuan Yu, Pradeep Kumar Gunda, and Michael Isard, ACM Symposium on Operating Systems Principles (SOSP), October 2009

Quincy: Fair Scheduling for Distributed Computing ClustersMichael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew GoldbergACM Symposium on Operating Systems Principles (SOSP), October 2009

Page 75: Cluster Computing with Dryad

Incremental Computation

Goal: Reuse (part of) prior computations to: - Speed up the current job- Increase cluster throughput- Reduce energy and costs

Goal: Reuse (part of) prior computations to: - Speed up the current job- Increase cluster throughput- Reduce energy and costs

Outputs

Inputs

Distributed Computation

Append-only data

Page 76: Cluster Computing with Dryad

Propose Two Approaches

1. Reuse Identical computations from the past(like make or memoization)

2. Do only incremental computation on the new data and Merge results with the previous ones(like patch)

Page 77: Cluster Computing with Dryad

Context

• Implemented for Dryad– Dryad Job = Computational DAG

• Vertex: arbitrary computation + inputs/outputs• Edge: data flows

Simple Example: Record Count

C

I2

C

AAdd

Outputs

Inputs (partitions)

Count

I1

Page 78: Cluster Computing with Dryad

Identical ComputationRecord Count

C

I2

C

AAdd

Outputs

Inputs (partitions)

Count

I1

First executionDAG

Page 79: Cluster Computing with Dryad

Identical Computation

Second executionDAG

Record Count

C

I2

C

AAdd

Outputs

Inputs (partitions)

Count

I1 I3

C

New Input

Page 80: Cluster Computing with Dryad

IDE – IDEntical Computation

Second executionDAG

Record Count

C

I2

C

AAdd

Outputs

Inputs (partitions)

Count

I1 I3

C

Identical subDAG

Page 81: Cluster Computing with Dryad

Identical Computation

IDE Modified DAG

Replaced with Cached Data

Replace identical computational subDAG with edge data cached from previous execution

Replace identical computational subDAG with edge data cached from previous execution

AAdd

Outputs

Inputs (partitions)

Count

I3

C

Page 82: Cluster Computing with Dryad

Identical Computation

IDE Modified DAG

Use DAG fingerprints to determine if computations are identical

AAdd

Outputs

Inputs (partitions)

Count

I3

C

Replace identical computational subDAG with edge data cached from previous execution

Replace identical computational subDAG with edge data cached from previous execution

Page 83: Cluster Computing with Dryad

Semantic Knowledge Can Help

C

I2

C

A

I1

Reuse Output

Page 84: Cluster Computing with Dryad

Semantic Knowledge Can Help

C

I2

C

A

I1

C

I3

A Merge (Add)

Previous Output

Incremental DAG

Page 85: Cluster Computing with Dryad

Mergeable Computation

C

I2

C

A

I1

C

I3

A Merge (Add)

Automatically Inferred

Automatically Built

User-specified

Page 86: Cluster Computing with Dryad

Mergeable Computation

C

I2

C

A

I1

A

C

I2

C

A

I1 I3

C

Empty

Save to Cache

Incremental DAG – Remove Old Inputs

Merge Vertex