1
http://www.gsic.titech.ac.jp/sc18 Inter-Cloud Infrastructure Cloud Infrastructure for Big Data Analysis Inter-Cloud Infrastructure for Big Data Analysis Academic Cloud @ Hokkaido Univ. Cloud Resource @ NII @National Institute of Genetics Cloud Platform @ Kyushu Univ. TSUBAME3.0 @Tokyo Tech. Public Cloud Resource @ SINET Inter-Cloud etc. Exabyte DB on Global Object Storage TSUBAME3 Bigdata Storage High Bandwidth & Secure Network by SINET5 Tight connection of Supercomputer and Inter-Cloud resources via SINET L2VPN Background HPC HPC users can cause congestion on PFS Not all the HPC systems have fast PFS Cloud Cloud storage cannot provide enough I/O throughput for data in- tensive applications. Loose consistency model in cloud storage System HuronFS (Hierarchical, UseR-level and ON-demand File system) as a new tier in torage hierarchy. Several dedicated nodes/instances as burst buffers to accelerate accesses by buffering intermediate data. Implemented on CCI : Utilizing the high performance network in HPC & High portability Implemented with FUSE : Supporting POSIX & No code modifica- tion required Consistent hash Compute node X HuronFS IOnode IOnode IOnode Master 0 IOnode IOnode SHFS 0 IOnode IOnode IOnode Master Y IOnode IOnode SHFS Y IOnode IOnode IOnode Master M-1 IOnode IOnode SHFS M-1 Parallel File System / Shared Cloud Storage HuronFS: New HPC Storage Hierachy for HPC & Cloud Backgroud Cloud platforms exhibit elasticity, flexibility, usability and scalability, which have been attracting users to use these environments as a cost effective measure to run their applications or businesses. However, the feasibility of running high performance computing applications on clouds has always been a concern mainly due to virtualization overheads and high-latency in- terconnection network. Goals To investigate the potential role of these virtual machines in addressing the needs of HPC and data-intensive workloads Experimentation Performance evaluation of applications on AWS C4 instances against the baseline results of a supercomputer, TSUABME-KFC Evaluation of HPC-Big Data Applications Acknowledgments. These researches are supported by CREST, JST CREST Grant Numbers JPMJCR1303 and JPMJCR 1501. (Research Area: Advanced Core Technologies for Big Data Integration). * 2 MPI ranks and 12 openmp threads per node achieved the best performance on TSUBAME-KFC. In case of AWS EC2, it was 8 MPI ranks and 12 openmp threads per instance. Graph500 Benchmark at Scale 26 NICAM-DC-MINI: A compute-intensive miniapp from Fiber miniapp suite TSUBAME-KFC: Intel Xeon E5-2620v2 x2, InfiniBand FDR Amazon EC2 c4.8xlarge instance: Intel Xeon E5-2666v3, 10Gbps Ethernet 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 1 4 16 Execution time [sec] # of nodes (10 MPI ranks per node) TSUBAME-KFC Computation Communication Stall 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90,000 1 4 16 Execution time [sec] # of instance (10 MPI ranks per instance) AWS EC2 c4.8xlarge instances Computation Communication Stall 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 2 4 8 16 Execution time [sec] # of nodes (2 MPI ranks, 12 OpenMP threads per node) TSUBAME-KFC App Time MPI Time 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 2 4 8 16 Execution time [sec] # of instance (8 MPI ranks, 12 OpenMP threads per instance) AWS EC2 c4.8xlarge instances App Time MPI Time 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 64 Throughput (MiB/sec) The number of compute nodes ( = The number of IONs) CloudBB (read) CloudBB (write) Amazon S3 (write) Amazon S3 (read) 0 500 1000 1500 2000 2500 3000 3500 1 2 4 8 16 Throughput (MiB/s) Nodes (8 processes per node) write read Multiple Client Sequential I/O Performance Building a Testbed Infrastructure on Overlay Cloud Using SINET5 network infrastructure (100Gbps Network) Cooperation with Colud and Supercomputer Providing the Science Data Repository Testbed: Peta Bytes class object storage Real System: Exa Bytes class object storage Cloud and Supercomputer Federaton Development of collaboration technology between cloud and supercomputer using TSUBAME 3.0 as a test case Docker Container Apply cloud container technology to Supercomputer for resource cooperation with the Inter-Cloud Code available at: https://github.com/EBD-CREST/HuronFS System: Amazon EC2 Tokyo Region, Instance Type: m3.xlarge, Cloud Storage: Amazon S3, Mount Method: s3f, Chunk Size: 5MB, Client Local Buffer Size: 100MB TSUBAME2 AMAZON Web Service

Inter-Cloud Infrastructure · Inter-Cloud Infrastructure Cloud Infrastructure for Big Data Analysis Inter-Cloud Infrastructure for Big Data Analysis

  • Upload
    others

  • View
    33

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Inter-Cloud Infrastructure ·  Inter-Cloud Infrastructure Cloud Infrastructure for Big Data Analysis Inter-Cloud Infrastructure for Big Data Analysis

http://www.gsic.titech.ac.jp/sc18

Inter-Cloud InfrastructureCloud Infrastructure for Big Data Analysis

Inter-Cloud Infrastructure for Big Data Analysis

Academic Cloud@ Hokkaido Univ.

Cloud Resource@ NII

@National Institute of Genetics

Cloud Platform@ Kyushu Univ.

TSUBAME3.0@Tokyo Tech.

Public Cloud Resource@ SINET Inter-Cloud

etc.

Exabyte DB onGlobal Object Storage

TSUBAME3 Bigdata Storage

High Bandwidth& Secure Network

by SINET5 Tight connection of Supercomputer and Inter-Cloud resources via SINET L2VPN

BackgroundHPC

HPC users can cause congestion on PFSNot all the HPC systems have fast PFS

CloudCloud storage cannot provide enough I/O throughput for data in-

tensive applications.Loose consistency model in cloud storage

SystemHuronFS (Hierarchical, UseR-level and ON-demand File system)

as a new tier in torage hierarchy.Several dedicated nodes/instances as burst buffers to accelerate

accesses by buffering intermediate data.Implemented on CCI : Utilizing the high performance network in

HPC & High portabilityImplemented with FUSE : Supporting POSIX & No code modifica-

tion required

Consistent hash

Computenode X

HuronFS

IOnode

IOnode

IOnode

Master 0

IOnode

IOnode

SHFS 0

IOnode

IOnode

IOnode

Master Y

IOnode

IOnode

SHFS Y

IOnode

IOnode

IOnode

Master M-1

IOnode

IOnode

SHFS M-1

Parallel File System / Shared Cloud Storage

HuronFS: New HPC Storage Hierachy for HPC & Cloud

BackgroudCloud platforms exhibit elasticity, flexibility, usability and scalability, which

have been attracting users to use these environments as a cost effective measure to run their applications or businesses. However, the feasibility of running high performance computing applications on clouds has always been a concern mainly due to virtualization overheads and high-latency in-terconnection network.

GoalsTo investigate the potential role of these virtual machines in addressing

the needs of HPC and data-intensive workloadsExperimentation

Performance evaluation of applications on AWS C4 instances against the baseline results of a supercomputer, TSUABME-KFC

Evaluation of HPC-Big Data Applications

Acknowledgments. These researches are supported by CREST, JST CREST

Grant Numbers JPMJCR1303 and JPMJCR 1501.

(Research Area: Advanced Core Technologies for Big Data Integration).

* 2 MPI ranks and 12 openmp threads per node achieved the best performance on TSUBAME-KFC. In case of AWS EC2, it was 8 MPI ranks and 12 openmp threads per instance.

Graph500 Benchmark at Scale 26

NICAM-DC-MINI: A compute-intensive miniapp from Fiber miniapp suite

TSUBAME-KFC: Intel Xeon E5-2620v2 x2, InfiniBand FDRAmazon EC2 c4.8xlarge instance: Intel Xeon E5-2666v3, 10Gbps Ethernet

010,00020,00030,00040,00050,00060,00070,00080,00090,000

1 4 16

Exec

utio

n tim

e [s

ec]

# of nodes (10 MPI ranks per node)

TSUBAME-KFCComputation Communication Stall

010,00020,00030,00040,00050,00060,00070,00080,00090,000

1 4 16

Exec

utio

n tim

e [s

ec]

# of instance (10 MPI ranks per instance)

AWS EC2 c4.8xlarge instancesComputation Communication Stall

00.20.40.60.8

11.21.41.6

2 4 8 16Exec

utio

n tim

e [s

ec]

# of nodes(2 MPI ranks, 12 OpenMP threads per node)

TSUBAME-KFCApp Time MPI Time

00.20.40.60.8

11.21.41.6

2 4 8 16Exec

utio

n tim

e [s

ec]

# of instance(8 MPI ranks, 12 OpenMP threads per instance)

AWS EC2 c4.8xlarge instancesApp Time MPI Time

0

2000

4000

6000

8000

10000

12000

14000

1 2 4 8 16 32 64

Thro

ughp

ut (M

iB/s

ec)

The number of compute nodes ( = The number of IONs)

CloudBB (read)CloudBB (write)

Amazon S3 (write)Amazon S3 (read)

0

500

1000

1500

2000

2500

3000

3500

1 2 4 8 16

Thro

ughp

ut (M

iB/s

)

Nodes (8 processes per node)

write read

Multiple Client Sequential I/O Performance

Building a Testbed Infrastructure on Overlay CloudUsing SINET5 network infrastructure (100Gbps Network)Cooperation with Colud and SupercomputerProviding the Science Data RepositoryTestbed: Peta Bytes class object storageReal System: Exa Bytes class object storage

Cloud and Supercomputer FederatonDevelopment of collaboration technology between cloud and

supercomputer using TSUBAME 3.0 as a test case

DockerContainer

Apply cloud container technologyto Supercomputer for resourcecooperation with the Inter-Cloud

Code available at: https://github.com/EBD-CREST/HuronFS

System: Amazon EC2 Tokyo Region, Instance Type: m3.xlarge, Cloud Storage: Amazon S3, Mount Method: s3f, Chunk Size: 5MB, Client Local Buffer Size: 100MB

TSUBAME2 AMAZON Web Service