Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Analytics Shipping with MapReduce
on HPC Backend Storage Servers
Weikuan Yu
Florida State University
July 9, 2018 - S2
Supercomputers
◼ Scientific applications from different disciplines: computational
genomics, drugs design etc.
◼ These applications generate colossal amounts of data.
◼ Visualization and analysis are critical for scientific discovery.
Sequoia SupercomputerTitan Supercomputer
July 9, 2018 - S3
Supercomputer Architecture
Typical HPC System Architecture
◼ Overview of typical HPC system architecture
➢ Processing Elements (PEs) offer computation powers
➢ Storage servers are leveraged to store data
◼ Need to improve communication and data analytics!
Execute work using
multiple processes,
process communicate
using MPI
Datasets are stored
on storage servers,
need to be analyzed
July 9, 2018 - S4
Issues in Data Analytics
◼ In-situ analytics solutions to avoid costly data movement
➢ Analyze temporary datasets as they are generated in memory
◼ Analytics for permanent data
➢ Such data already stored at the backend storage.
➢ Retrieve from HPC backend servers.
➢ Store them back to the servers.
◼ Our focus: analytics of permanent datasets
➢ Avoid costly transferring data
➢ Use underutilized computation resource
July 9, 2018 - S5
Overarching Goal
◼ Case Study: MapReduce and Lustre
➢ MapReduce: Analytics model
➢ Lustre: HPC backend storage server
◼ Analytics Shipping
➢ Retaining the default I/O model for scientific applications and
storing data on Lustre
➢ Shipping analytics to Lustre storage servers
July 9, 2018 - S6
MapReduce - YARN
◼ A simple data processing model to process big data
◼ Designed for commodity off-the-shelf hardware components.
◼ Strong merits for big data analytics
➢ Scalability: increase throughput by increasing # of nodes
➢ Fault-tolerance (quick and low cost recovery of the failures of tasks)
◼ YARN, the next generation of Hadoop MapReduce Implementation
MRAppMaster
H D F S
Map
Task
Map
Task
Map
Task
Map
Task
(2)
(3)
Reduce
Task
Reduce
Task
HDFS
(3)
HDFS
Resource
Manager
July 9, 2018 - S7
KVM-based Segregation for YARN
◼ Leverage KVM to create VM (virtual machine)
instances on SNs
◼ Create Lustre storage servers on the physical
machines (PMs)
◼ Run YARN programs and Lustre clients on the VMs
KVM
Physical Machine
OST
OSS
A Total of 8 OSTs
Yarn
July 9, 2018 - S8
YARN on KVM and OSS/OST on PM
◼ We run YARN jobs on KVM to process 40GB data.
◼ Lustre on physical machine. 4 clients write 320GB into the
Lustre.
◼ When YARN jobs complete, KVM releases 1 GB.
July 9, 2018 - S9
Virtualized Analytics Shipping (VAS)
◼ Create VMs on storage servers for parallel analytics
◼ Launch MapReduce programs on the VM cluster
➢ Besides interference segregation, good performance is needed
➢ Seamless integration of MapReduce with Lustre servers
MapReduce-based Analytics Shipping Model
Lustr
e
OST
OSS
KVM
OST
OSS
KVM
OST
OSS
KVM
…
…
MDS
MDT
MapReduce-based Analytics Jobs
July 9, 2018 - S10
Achieving Efficient Analytics Shipping
◼ High-performance network and disk I/O
➢ KVM segregation introduces virtualization
➢ To ensure good network I/O across KVM
➢ To achieve good disk I/O from KVM to disks
◼ Intermediate Data Placement and Shuffling
➢ Utilize Lustre to store intermediate data
➢ To avoid unnecessary data spilling in Yarn shuffling
◼ Distributing Data and exploiting Task Locality
➢ Yarn MapReduce (Data-Centric) vs. HPC Storage (Compute-Centric)
➢ To address mismatches of data and task scheduling
July 9, 2018 - S11
Fast Network and Disk I/O in KVM
◼ Network Configuration for KVM
➢ VIRTIO vs. SR-IOV: use of switch bypassing
➢ Configure routing table:
o VIRTIO for local communication and SR-IOV for remote connection
◼ Disk Configuration for KVM
➢ Apply VIRTIO for attaching storage devices to KVM
Disk I/O performance of PM and VMBandwidth of Different Communication Cases
0
20
40
60
80
100
120
140
SeqWriteSeqReadRandWriteRandRead
Ban
dwidth(MB/S)
I/O-PM
I/O-VM
July 9, 2018 - S12
KVM Interference Validation
◼ Evaluate the effects of VAS framework on storage server
➢ IOZone simulates heavy I/O workload with YARN simultaneously
➢ Measure Lustre I/O bandwidth under 3 cases
o Lustre-Alone, Lustre-YARNinPM and Lustre-YARNinVM
◼ Lustre with KVM segregation achieve good I/O performance
Using KVM to Alleviate YARN’s Interference to Lustre
0
100
200
300
400
500
600
700
800
900
1000
SeqWriteSeqReadRandWriteRandRead
Ban
dwidth(MB/S)
Lustre-Alone
Lustre-YARNinPM
Lustre-YARNinVM
Running Yarn in VMs has
negligible impact to Lustre servers
July 9, 2018 - S13
Data Flow in YARN over HDFS
◼ Map Task: Input Split, Spilled Data, MapOutput
◼ Reduce Task: Shuffled Data, Merged Data, Output Data
◼ Maximum of 10 steps in the YARN Hadoop with HDFS
MapTask ReduceTask
HDFSInput Split
map sortAndSpill
Spilled
mergeParts
MapOutput
shuffle Merge
Shuffle spilled
Repetitive
Merge
Merged
reduce
Output
July 9, 2018 - S14
Pipelined Merging and Reducing
◼ Decouples the segment sorting and data merging
◼ Pipelines merging and reducing of the data
➢ Merges data when all segments are sorted
➢ Avoids repetitive merging
MapTask ReduceTask
LustreInput Split
map sortAndSpill mergeParts Sort-Merge
Spilled MapOutput
reduce
Output
July 9, 2018 - S15
Stripe-Aligned Data & Task Scheduling
◼ Lustre stripe location aware Scheduler in VAS:
◼ Get input data split location, schedule MapTask to input data
➢ Set MapTask input split size equal to Lustre strip size
◼ Write spilled data, MOFs and final output locally
◼ Total of 6 steps in the new implementation
July 9, 2018 - S16
Evaluation Environment
◼ Hardware Configuration
➢ 22 nodes Cluster
➢ Each node equips:
o Dual-socket quad-core 2.13 GHz Intel Xeon processors
o 8GB DDR2 800 MHz memory, 500GB 7200RPM SATA hard-drives
➢ All nodes are connected by 10Gbit Ethernet
◼ System Configuration
➢ Yarn version 2.0.4, Lustre version 2.5, CentOS 6.4
◼ Benchmarks and applications
➢ Terasort, WordCount, SecondarySort and TestDFSIO
➢ NAS BT-IO and YARN K-Means
July 9, 2018 - S17
Overall Performance
◼ TeraSort Benchmark: VAS outperforms YARN-HDFS and YARN-Lustre by
16.8% and 30.3% respectively
◼ TestDFSIO Benchmark: compared to YARN-HDFS and YARN-Lustre
➢ VAS can achieve highest read and write bandwidth
0
100
200
300
400
500
600
700
800
900
1000
TeraSort WordCountSecondarySort
JobExecu
onTim
e(Sec) YARN-HDFS
YARN-LustreVAS
0
10
20
30
40
50
60
70
80
TestDFSIO-Write TestDFSIO-Read
Throughput(M
B/Sec)
YARN-HDFSYARN-LustreVAS
Job Execution TestDFSIO
July 9, 2018 - S18
Locality-aware Read/Write Performance
◼ VAS achieves much higher Task locality
➢ VAS reduces both MapTask read time and ReduceTask write time
◼ The benefits of achieving locality grow as data size increases
MapTask Read Time ReduceTask Write Time
0
5
10
15
20
10G 20G 30G 40G
Avg.M
apTaskRead
Tim
e(S)
YARN-LustreVAS_LocalityOnly
0
10
20
30
40
50
10G 20G 30G 40GAvg.R
edTaskW
riteTim
e(S)
YARN-LustreVAS_LocalityOnly
21.4%
19.3%
July 9, 2018 - S19
Network and Disk Utilization
◼ VAS is able to reduces both network and disk utilization
◼ On average, VAS cuts down total network throughput by 26.7% than Yarn-
Lustre
◼ VAS is able to reduce disk I/O request numbers by 62.3% on average
Network Throughput Number of I/O Requests
0
20
40
60
80
100
120
0 100 200 300 400 500
Th
rou
gh
pu
t (M
B/s
)
Time (s)
YARN-Lustre
VAS
0
100
200
300
400
500
600
700
0 100 200 300 400 500
I/O
Req
ue
st
Nu
mb
er
Time (s)
YARN-Lustre
VAS
July 9, 2018 - S20
Conclusion
◼ Exploited analytics shipping for analysis of persistent datasets on HPC
backend storage servers.
◼ Designed Virtualized Analytics Shipping (VAS) with three component
techniques for efficient data analytics
◼ Provided end-to-end optimizations on data organization, movement, and
task scheduling the VAS framework.
◼ Evaluated the performance of VAS to benchmarks and applications
July 9, 2018 - S21
Acknowledgment
◼ Participants
➢ Robin Goldstone from Lawrence Livermore National Lab
➢ Eric Barton, Bryon Neitzel and Omkar Kulkarni from Intel
➢ Cong Xu, Yandong Wang and Hui Chen