Analytics Shipping with MapReduce on HPC Backend Storage … · July 9, 2018 - S6 MapReduce - YARN A simple data processing model to process big data Designed for commodity off-the-shelf

Analytics Shipping with MapReduce

on HPC Backend Storage Servers

Weikuan Yu

Florida State University

July 9, 2018 - S2

Supercomputers

◼ Scientific applications from different disciplines: computational

genomics, drugs design etc.

◼ These applications generate colossal amounts of data.

◼ Visualization and analysis are critical for scientific discovery.

Sequoia SupercomputerTitan Supercomputer

July 9, 2018 - S3

Supercomputer Architecture

Typical HPC System Architecture

◼ Overview of typical HPC system architecture

➢ Processing Elements (PEs) offer computation powers

➢ Storage servers are leveraged to store data

◼ Need to improve communication and data analytics!

Execute work using

multiple processes,

process communicate

using MPI

Datasets are stored

on storage servers,

need to be analyzed

July 9, 2018 - S4

Issues in Data Analytics

◼ In-situ analytics solutions to avoid costly data movement

➢ Analyze temporary datasets as they are generated in memory

◼ Analytics for permanent data

➢ Such data already stored at the backend storage.

➢ Retrieve from HPC backend servers.

➢ Store them back to the servers.

◼ Our focus: analytics of permanent datasets

➢ Avoid costly transferring data

➢ Use underutilized computation resource

July 9, 2018 - S5

Overarching Goal

◼ Case Study: MapReduce and Lustre

➢ MapReduce: Analytics model

➢ Lustre: HPC backend storage server

◼ Analytics Shipping

➢ Retaining the default I/O model for scientific applications and

storing data on Lustre

➢ Shipping analytics to Lustre storage servers

July 9, 2018 - S6

MapReduce - YARN

◼ A simple data processing model to process big data

◼ Designed for commodity off-the-shelf hardware components.

◼ Strong merits for big data analytics

➢ Scalability: increase throughput by increasing # of nodes

➢ Fault-tolerance (quick and low cost recovery of the failures of tasks)

◼ YARN, the next generation of Hadoop MapReduce Implementation

MRAppMaster

H D F S

Map

Task

Map

Task

Map

Task

Map

Task

(2)

(3)

Reduce

Task

Reduce

Task

HDFS

(3)

HDFS

Resource

Manager

July 9, 2018 - S7

KVM-based Segregation for YARN

◼ Leverage KVM to create VM (virtual machine)

instances on SNs

◼ Create Lustre storage servers on the physical

machines (PMs)

◼ Run YARN programs and Lustre clients on the VMs

KVM

Physical Machine

OST

OSS

A Total of 8 OSTs

Yarn

July 9, 2018 - S8

YARN on KVM and OSS/OST on PM

◼ We run YARN jobs on KVM to process 40GB data.

◼ Lustre on physical machine. 4 clients write 320GB into the

Lustre.

◼ When YARN jobs complete, KVM releases 1 GB.

July 9, 2018 - S9

Virtualized Analytics Shipping (VAS)

◼ Create VMs on storage servers for parallel analytics

◼ Launch MapReduce programs on the VM cluster

➢ Besides interference segregation, good performance is needed

➢ Seamless integration of MapReduce with Lustre servers

MapReduce-based Analytics Shipping Model

Lustr

e

OST

OSS

KVM

OST

OSS

KVM

OST

OSS

KVM

…

…

MDS

MDT

MapReduce-based Analytics Jobs

July 9, 2018 - S10

Achieving Efficient Analytics Shipping

◼ High-performance network and disk I/O

➢ KVM segregation introduces virtualization

➢ To ensure good network I/O across KVM

➢ To achieve good disk I/O from KVM to disks

◼ Intermediate Data Placement and Shuffling

➢ Utilize Lustre to store intermediate data

➢ To avoid unnecessary data spilling in Yarn shuffling

◼ Distributing Data and exploiting Task Locality

➢ Yarn MapReduce (Data-Centric) vs. HPC Storage (Compute-Centric)

➢ To address mismatches of data and task scheduling

July 9, 2018 - S11

Fast Network and Disk I/O in KVM

◼ Network Configuration for KVM

➢ VIRTIO vs. SR-IOV: use of switch bypassing

➢ Configure routing table:

o VIRTIO for local communication and SR-IOV for remote connection

◼ Disk Configuration for KVM

➢ Apply VIRTIO for attaching storage devices to KVM

Disk I/O performance of PM and VMBandwidth of Different Communication Cases

0

20

40

60

80

100

120

140

SeqWriteSeqReadRandWriteRandRead

Ban

dwidth(MB/S)

I/O-PM

I/O-VM

July 9, 2018 - S12

KVM Interference Validation

◼ Evaluate the effects of VAS framework on storage server

➢ IOZone simulates heavy I/O workload with YARN simultaneously

➢ Measure Lustre I/O bandwidth under 3 cases

o Lustre-Alone, Lustre-YARNinPM and Lustre-YARNinVM

◼ Lustre with KVM segregation achieve good I/O performance

Using KVM to Alleviate YARN’s Interference to Lustre

0

100

200

300

400

500

600

700

800

900

1000

SeqWriteSeqReadRandWriteRandRead

Ban

dwidth(MB/S)

Lustre-Alone

Lustre-YARNinPM

Lustre-YARNinVM

Running Yarn in VMs has

negligible impact to Lustre servers

July 9, 2018 - S13

Data Flow in YARN over HDFS

◼ Map Task: Input Split, Spilled Data, MapOutput

◼ Reduce Task: Shuffled Data, Merged Data, Output Data

◼ Maximum of 10 steps in the YARN Hadoop with HDFS

MapTask ReduceTask

HDFSInput Split

map sortAndSpill

Spilled

mergeParts

MapOutput

shuffle Merge

Shuffle spilled

Repetitive

Merge

Merged

reduce

Output

July 9, 2018 - S14

Pipelined Merging and Reducing

◼ Decouples the segment sorting and data merging

◼ Pipelines merging and reducing of the data

➢ Merges data when all segments are sorted

➢ Avoids repetitive merging

MapTask ReduceTask

LustreInput Split

map sortAndSpill mergeParts Sort-Merge

Spilled MapOutput

reduce

Output

July 9, 2018 - S15

Stripe-Aligned Data & Task Scheduling

◼ Lustre stripe location aware Scheduler in VAS:

◼ Get input data split location, schedule MapTask to input data

➢ Set MapTask input split size equal to Lustre strip size

◼ Write spilled data, MOFs and final output locally

◼ Total of 6 steps in the new implementation

July 9, 2018 - S16

Evaluation Environment

◼ Hardware Configuration

➢ 22 nodes Cluster

➢ Each node equips:

o Dual-socket quad-core 2.13 GHz Intel Xeon processors

o 8GB DDR2 800 MHz memory, 500GB 7200RPM SATA hard-drives

➢ All nodes are connected by 10Gbit Ethernet

◼ System Configuration

➢ Yarn version 2.0.4, Lustre version 2.5, CentOS 6.4

◼ Benchmarks and applications

➢ Terasort, WordCount, SecondarySort and TestDFSIO

➢ NAS BT-IO and YARN K-Means

July 9, 2018 - S17

Overall Performance

◼ TeraSort Benchmark: VAS outperforms YARN-HDFS and YARN-Lustre by

16.8% and 30.3% respectively

◼ TestDFSIO Benchmark: compared to YARN-HDFS and YARN-Lustre

➢ VAS can achieve highest read and write bandwidth

0

100

200

300

400

500

600

700

800

900

1000

TeraSort WordCountSecondarySort

JobExecu

onTim

e(Sec) YARN-HDFS

YARN-LustreVAS

0

10

20

30

40

50

60

70

80

TestDFSIO-Write TestDFSIO-Read

Throughput(M

B/Sec)

YARN-HDFSYARN-LustreVAS

Job Execution TestDFSIO

July 9, 2018 - S18

Locality-aware Read/Write Performance

◼ VAS achieves much higher Task locality

➢ VAS reduces both MapTask read time and ReduceTask write time

◼ The benefits of achieving locality grow as data size increases

MapTask Read Time ReduceTask Write Time

0

5

10

15

20

10G 20G 30G 40G

Avg.M

apTaskRead

Tim

e(S)

YARN-LustreVAS_LocalityOnly

0

10

20

30

40

50

10G 20G 30G 40GAvg.R

edTaskW

riteTim

e(S)

YARN-LustreVAS_LocalityOnly

21.4%

19.3%

July 9, 2018 - S19

Network and Disk Utilization

◼ VAS is able to reduces both network and disk utilization

◼ On average, VAS cuts down total network throughput by 26.7% than Yarn-

Lustre

◼ VAS is able to reduce disk I/O request numbers by 62.3% on average

Network Throughput Number of I/O Requests

0

20

40

60

80

100

120

0 100 200 300 400 500

Th

rou

gh

pu

t (M

B/s

)

Time (s)

YARN-Lustre

VAS

0

100

200

300

400

500

600

700

0 100 200 300 400 500

I/O

Req

ue

st

Nu

mb

er

Time (s)

YARN-Lustre

VAS

July 9, 2018 - S20

Conclusion

◼ Exploited analytics shipping for analysis of persistent datasets on HPC

backend storage servers.

◼ Designed Virtualized Analytics Shipping (VAS) with three component

techniques for efficient data analytics

◼ Provided end-to-end optimizations on data organization, movement, and

task scheduling the VAS framework.

◼ Evaluated the performance of VAS to benchmarks and applications

July 9, 2018 - S21

Acknowledgment

◼ Participants

➢ Robin Goldstone from Lawrence Livermore National Lab

➢ Eric Barton, Bryon Neitzel and Omkar Kulkarni from Intel

➢ Cong Xu, Yandong Wang and Hui Chen

Documents

Analytics Shipping with MapReduce on HPC Backend Storage … · July 9, 2018 - S6 MapReduce - YARN A simple data processing model to process big data Designed for commodity off-the-shelf