29
Jacob Leverich and Christos Kozyrakis Stanford University On the Energy (In)efficiency of Hadoop: Scale-down Efficiency

On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Jacob Leverich

and Christos Kozyrakis

Stanford University

On the Energy (In)efficiency of Hadoop:

Scale-down Efficiency

Page 2: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

The current design of Hadoop precludes

scale-down of commodity clusters.

2

Page 3: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Outline

3

Hadoop crash-course

Scale-down efficiency

How Hadoop precludes scale-down

How to fix it

Did we fix it?

Future work

Page 4: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Hadoop crash-course

4

Hadoop == Distributed Processing Framework

1000s of nodes, PBs of data

Hadoop MapReduce Google MapReduce

Tasks are automatically distributed by the framework.

Hadoop Distributed File System Google File System

Files divided into large (64MB) blocks; amortizes overheads.

Blocks replicated for availability and durability management.

Page 5: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Scale-down motivation

5

HP Proliant DL140 G3

Typical utilization

[Barroso and Holzle, 2007]

Page 6: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Scale-down for energy proportionality

6

40% 40% 40% 40%

= 4 x P(40%) = 4 x 325W = 1300W

80% 80% 0% 0%

= 2 x P(80%) = 2 x 365W = 730W

Page 7: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

The problem: storage consolidation

7

Hadoop Distributed File System… Consolidate computation? Easy.

Consolidate storage? Not (as) easy.

“All servers must be available, even during low-load periods.” [Barroso and Holzle, 2007]

Hadoop inherited this “feature” from Google File System

:-(

Page 8: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

HDFS and block replication

8

“block replication table” = replica

1st replica local, others remote.

Allocate evenly.

Replication factor = 3

1 2 3 4 5 6 7 8 9

Node

Blo

ck

A

B

C

D

E

F

G

H

Page 9: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Attempted scale-down

9

1 2 3 4 5 6 7 8 9

Node

Blo

ck

A

B

C

D

E

F

G

H

Problems:

Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas

Flurry of net & disk activity!

Which nodes to disable?

Must maintain data availability

Page 10: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

How to fix it

10

Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas

Flurry of net & disk activity!

Which nodes to disable?

Must maintain data availability

Self-non-healing

Page 11: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Self-non-healing

11

1 2 3 4 5 6 7 8 9

Node

Blo

ck

A

B

C

D

E

F

G

H

Zzzzz…

Coordinate with Hadoop when we put a node to sleep

Prevent block re-replications

Page 12: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

New RPCs in HDFS primary node

12

sleepNode(String hostname)

Similar to node decommissioning, but don’t replicate blocks

% hadoop dfsadmin –sleepNode 10.10.1.80:50020

Save blocks to a “sleeping blocks” map for bookkeeping

Ignore heartbeats and block reports from this node

wakeNode(String hostname)

Watch for heartbeats, force node to send block report

Execute arbitrary commands (i.e. send wake-on-LAN packet)

wakeBlock(Block target)

Wake a sleeping node that has a particular block

Page 13: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

How to fix it

13

Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas

Flurry of net & disk activity!

Which nodes to disable?

Must maintain data availability

Self-non-healing

“Covering Subset” replication invariant

Page 14: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Replication placement invariants

14

Hadoop uses simple invariants to direct block placement

Example: Rack-Aware Block Placement

Protects against common-mode failures (i.e. switch failure, power delivery failure)

Invariant: Blocks must have replicas on at least 2 racks.

Is there some energy-efficient replication invariant?

Must inform our decision on which nodes we can disable.

Page 15: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Covering subset replication invariant

15

Goal:

Maximize the number of servers that can simultaneously sleep.

Strategy:

Aggregate live data onto a “covering subset” of nodes.

Never turn off a node in the covering subset.

Invariant:

Every block must have one replica in the covering subset.

Page 16: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Covering subset replication invariant

16

1 2 3 4 5 6 7 8 9

Node

Blo

ck

A

B

C

D

E

F

G

H

Zzzzz…

Page 17: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

How to fix it

17

Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas

Flurry of net & disk activity!

Which nodes to disable?

Must maintain data availability

Self-non-healing

“Covering Subset” replication invariant

Page 18: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Evaluation

18

Page 19: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Methodology

19

Disable n nodes, compare Hadoop job energy & perf. Individual runs of webdata_sort/webdata_scan from GridMix

30 minute job batches (with some idle time!)

Cluster 36 nodes, HP Proliant DL140 G3

2 quad-core Xeon 5335s each, 32GB RAM, 500GB disk

9-node covering subset (1/4 of the cluster)

Energy model Validated estimate based on CPU utilization

Disabled node = 0 Watts

Possible to evaluate hypothetical hardware

Page 20: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Results: Performance

20

It slows down (obviously)

Peak performance benchmark

Sort (network intensive) worse off than Scan

Amdahl’s Law

Page 21: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Results: Energy

21

Less energy consumed for same amount of work

9% to 51% saved

Nodes consume energy more than they improve performance

Slower systems usually more efficient; high performance is a trade-off!

Page 22: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Results: Power

22

Excellent knob for cluster-level power capping

Much larger dynamic range than tweaking frequency/voltage at the server level

Page 23: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Results: The Bottom Line

23

Operational Hadoop clusters can scale-down.

We reduce energy consumption

at the expense of single-job latency.

Page 24: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Continuing Work

24

Page 25: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Covering subset: mechanism vs. policy

25

The replication invariant is a mechanism.

Which nodes constitute a subset is policy (open question).

Size trade-off

Too small: Low capacity and performance bottleneck

Too large: Wasted energy on idle nodes

1 / (replication factor) reasonable starting point

How many covering subsets?

Invariant: Blocks must have a replica in each covering subsets.

Page 26: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Quantify Trade-offs

26

Random Fault Injection experiments

What happens when a covering subset node fails?

How much do you trust idle disks?

Performance

Availability

Energy consumption

Durability

Page 27: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Dynamic Power Management

27

Algorithmically decide which nodes to sleep or wakeup

What signals to use?

CPU utilization?

Disk/net utilization?

Job Queue length?

MapReduce and HDFS must cooperate

i.e. idle nodes may host transient Map outputs

Page 28: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Workloads

28

Benchmarks

HBase/BigTable vs. MapReduce

Short, unpredictable data access vs. long streaming access

Quality of service and throughput are important

Pig vs. Sort+Scan

Recorded job traces vs. random job traces

Peak performance vs. fractional utilization

What are typical usage patterns?

Page 29: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing

Scale

29

36-nodes to 1000-nodes; emergent behaviors?

Network hierarchy

Hadoop framework inefficiencies

Computational overhead (must process many block reports!)

Experiments on Amazon EC2

Awarded an Amazon Web Services grant

Can’t measure power! Must use a model.

Any Amazonians here? Let’s make a validated energy model.