21
Can High-Performance Interconnects Benefit Hadoop Distributed File System? Sayantan Sur Hao Wang Jian Huang Xiangyong Ouyang D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA

Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Can High-Performance Interconnects Benefit Hadoop Distributed File System?

Sayantan Sur Hao Wang Jian Huang

Xiangyong Ouyang D. K. Panda

Network-Based Computing Laboratory Department of Computer Science and Engineering

The Ohio State University, USA

Page 2: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Introduction •  MapReduce – scalable model to process Petabytes

•  Hadoop MapReduce framework widely adopted –  Hadoop Distributed Filesystem (HDFS) provides core storage,

distribution and fault-tolerance features

–  Designed with Gigabit Ethernet and Sockets in mind

•  The field of High-Performance Computing (HPC) has adopted advanced interconnects –  Low latency, High Bandwidth

–  Low CPU cycle requirement

–  InfiniBand, 10 Gigabit Ethernet are two examples

•  Solid State Drives providing improved IO characteristics

•  Can HDFS benefit from these two emerging technologies?

2

Page 3: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Typical HPC and Cloud Computing Deployments

•  HPC system design is interconnect centric

•  Cloud computing environment has complex software and historically relied on sockets and ethernet

3

Compute  cluster  

High-­‐Performance  Interconnects    Frontend   GigE  

Physical  Machine  

VM   VM  Physical  Machine  

VM   VM  

Physical  Machine  

VM   VM   Virtual  FS  

Meta-­‐Data  Meta  Data  

I/O  Server   Data  

I/O  Server   Data  

I/O  Server   Data  

I/O  Server   Data  

Physical  Machine  

VM   VM  

Local  Storage  

Local  Storage  

Local  Storage  

Local  Storage  

Page 4: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

HDFS Architecture •  HDFS is a distributed user-level

file system

•  Provides fault-tolerance by replicating data blocks

•  Block size typically 64MB and replicated three times (possibly in different racks)

•  Dedicated NameNode to store information on data blocks

•  DataNodes just store blocks and schedule Map-reduce computation jobs

•  Dedicated JobTracker to track jobs (and failure)

4

Page 5: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

InfiniBand and 10 Gigabit Ethernet •  InfiniBand is an industry standard packet switched network

•  Has been increasingly adopted in HPC systems

•  User-level networking with OS-bypass (verbs)

•  10 Gigabit Ethernet follow up to Gigabit Ethernet

•  Provides user-level networking with OS-bypass (iwarp)

•  Some vendors have accelerated TCP/IP by putting it on the network card (hardware offload)

•  Convergence: possible to use both through OpenFabrics networking stack –  Same software, different interconnects

5

Page 6: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

InfiniBand Efficiency in Top500 List

6

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

100  

0   50   100   150   200   250   300   350   400   450   500  

Efficien

cy  (%

)  

Top  500  Systems  

Computer  Cluster  Efficiency  Comparison  

IB-­‐CPU   IB-­‐GPU/Cell   GigE   10GigE   IBM-­‐BlueGene   Cray  

Page 7: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Outline

•  Introduction

•  Problem Statement

•  Modern Interconnects and Protocols

•  Experimental Results & Analysis

•  Conclusions & Future Work

7

Page 8: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Problem Statement

•  Can High-Performance Interconnects help HDFS performance significantly?

•  How much benefit is possible without any software modifications?

•  Can emerging SSD technology complement performance advantages of high-performance interconnects?

8

Page 9: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Outline

•  Introduction

•  Problem Statement

•  Modern Interconnects and Protocols

•  Experimental Results & Analysis

•  Conclusions & Future Work

9

Page 10: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Modern Interconnects and Protocols

10

Application

Verbs Sockets Application Interface

TCP/IP

Hardware Offload

TCP/IP

Ethernet Driver

Kernel Space

Protocol Implementation

1/10 GigE Adapter

Ethernet Switch

Network Adapter

Network Switch

1/10 GigE

InfiniBand Adapter

InfiniBand Switch

IPoIB

IPoIB

SDP

RDMA User space

Verbs

InfiniBand Adapter

InfiniBand Switch

SDP

InfiniBand Adapter

InfiniBand Switch

RDMA

10 GigE Adapter

10 GigE Switch

10 GigE-TOE

Page 11: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

MVAPICH and MVAPICH2 Software •  High Performance MPI Library for IB and HSE

–  MVAPICH (MPI-1) and MVAPICH2 (MPI-2.2)

–  Used by more than 1,300 organizations in 60 countries

–  More than 46,000 downloads from OSU site directly

–  Empowering many TOP500 clusters

•  11th ranked 81,920-core cluster (Pleiades) at NASA

•  15th ranked 62,976-core cluster (Ranger) at TACC

–  Available with software stacks of many IB, HSE and server

vendors including Open Fabrics Enterprise Distribution (OFED)

–  http://mvapich.cse.ohio-state.edu

11

Page 12: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

One-way Latency: MPI over IB

12

0

1

2

3

4

5

6 Small Message Latency

Message Size (bytes)

Late

ncy

(us)

1.96  

1.54  1.60  

2.17  

0

50

100

150

200

250

300

350

400

MVAPICH-Qlogic-DDR

MVAPICH-Qlogic-QDR-PCIe2

MVAPICH-ConnectX-DDR

Late

ncy

(us)

Message Size (bytes)

Large Message Latency

All numbers taken on 2.4 GHz Quad-core (Nehalem) Intel with IB switch

Page 13: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Bandwidth: MPI over IB

13

0

500

1000

1500

2000

2500

3000

3500 Unidirectional Bandwidth

Mill

ionB

ytes

/se

c

Message Size (bytes)

2665.6  

3023.7  

1901.1  

1553.2  

0

1000

2000

3000

4000

5000

6000

7000

MVAPICH-Qlogic-DDR

MVAPICH-Qlogic-QDR-PCIe2

MVAPICH-ConnectX-DDR

Bidirectional Bandwidth

Mill

ionB

ytes

/se

c

Message Size (bytes)

2990.1  3244.1  

3642.8  

5835.7  

All numbers taken on 2.4 GHz Quad-core (Nehalem) Intel with IB switch

Page 14: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Outline

•  Introduction

•  Problem Statement

•  Modern Interconnects and Protocols

•  Experimental Results & Analysis

•  Conclusions & Future Work

14

Page 15: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Experimental Setup •  Hadoop 0.20.2

•  Sun/Oracle Java 1.6.0

•  Intel Xeon 2.33GHz Quad Core CPUs

•  Main memory 6GB, 250GB Hard disk

•  Intel X-25E 64GB SSD

•  Mellanox MT25208 DDR (16Gbps) InfiniBand

•  Chelsio T320 (10GbE)

•  We dedicate one node as NameServer another as JobTracker

•  We vary the DataNode from 2, 4 and 8

15

Page 16: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Microbenchmark Level Evaluation

16

•  Sockets level ping-pong bandwidth test

•  Client sends data to server; server receives data; sends an ack back

•  Java performance depends on usage of NIO (allocateDirect)

•  C and Java versions of the benchmark have similar performance

•  HDFS does not use direct allocated blocks or NIO on DataNode

Bandwidth with C version Bandwidth with Java version

Page 17: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

DFS IO Write Performance

17

•  DFS IO included in Hadoop, measures sequential access throughput

•  We have two map tasks each writing to a file of increasing size (1-10GB)

•  Eight data nodes for HDD (left) four data nodes for SSD (right)

•  Significant improvement with IPoIB, SDP and 10GigE

•  With SSD, performance improvement is almost seven or eight fold!

•  SSD benefits not seen without using high-performance interconnect!

Page 18: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

RandomWriter Performance

•  Each map generates 1GB of random binary data and writes to HDFS

•  30% improvement for HDD using 8 DataNodes (IPoIB, SDP, 10GigE)

•  SSD improves execution time by 50% with 1GigE for two DataNodes

•  However, when using four DataNodes, unless IPoIB, SDP or 10GigE is used, benefits are not observed

•  IPoIB, SDP and 10GigE can improve performance by 59% on four DataNodes

18

Page 19: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Sort Benchmark

19

•  Sort: baseline benchmark for Hadoop

•  Bound by disk IO bandwidth for sort phase, but network performance bound for reduce phase

•  SSD improves performance by 28% using 1GigE with two DataNodes

•  Benefit of 48% on four DataNodes using SDP, IPoIB or 10GigE

Page 20: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Conclusions and Future Work

20

•  High-Performance interconnects can be used to boost performance of HDFS workloads

•  Benefits are observed when SSD is used instead of Hard disk in combination with fast network

•  Disk IO is still a bottleneck in HDFS

•  Using SSDs alone cannot improve performance

•  Must couple High-Performance interconnect with SSDs

•  HDFS design can be improved to use lower level communication (verbs) to further leverage advanced networks

•  We are currently looking at more workloads and HBase performance

Page 21: Can High-Performance Interconnects Benefit Hadoop Distributed File System?nowlab.cse.ohio-state.edu/.../slide/masvdc10-hdfs-ib_1.pdf · 2017. 7. 18. · – Hadoop Distributed Filesystem

Thank You! {surs, wangh, huangjia, ouyangx, panda}@cse.ohio-state.edu

Network-Based Computing Laboratory

http://nowlab.cse.ohio-state.edu/

21

MVAPICH Web Page http://mvapich.cse.ohio-state.edu/�