23
Novel Architectures for Applications in Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies at Georgia Tech 1 March 2019

Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Novel Architectures for Applications inData Science and BeyondJason Riedy, Jeffrey Young, Tom ConteCenter for Research into Novel Computing Hierarchies at Georgia Tech

1 March 2019

Page 2: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Apps: Massive+-scale data analysis

Cyber-security Identify anomalies, malicious actors

Health care Find outbreaks, population epidemiology, similarpatient association

Social networks Advertising, searching, grouping

Intelligence Decisions at scale, regulating markets, smart &sustainable cities

Systems biology Understanding interactions, drug design

Power grid / Smart cities Disruptions, conservation, prediction

Irregular data access. Changing data.

Rogues Gallery — 1 Mar 2019 2/23

Page 3: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

High-Performance Data Analysis (HPDA)

Novel applications:

• Data at scale and speed needs new ideas forcomputing analysis.

• “Big data” platforms fare poorly v. a single threadplus large SSD even for static data sets. (McSherry,Isard, Murray. “Scalability! But at what COST?” HotOSXV, 2015.)

• Many high-level codes are written and re-written toanswer one question: need flexibility.

• Some primitives may be tuned and re-used.

Rogues Gallery — 1 Mar 2019 3/23

Page 4: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

HPDA and Architectures

• Current architectures are hitting limits onmanufacturing, heat dissipation, memory latency...

• New architecture proposals are difficult to evaluatevia simulation and modeling alone.

• What happens when novel prototypes hit reality?• Architects/designers need rapid feedback on newideas.

• New ideas only become successful with acommunity: software ecosystem, trained students.

Need bridges between apps and architects.

Rogues Gallery — 1 Mar 2019 4/23

Page 5: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Introducing the CRNCH Rogues Gallery

A physical & virtual space for hosting novel computingarchitectures, systems, and accelerators.

Emu Chick FPGAs & HMC/HBMFPAA

Amortize effort and cost of trying novel architectures.Break the “but it’s too much work” barrier.

http://crnch.gatech.edu/rogues-gallery

Rogues Gallery — 1 Mar 2019 5/23

Page 6: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Emu Technology’s PGAS Architecture

1 nodelet

Gossamer

Core 1

Memory-Side Processor

Gossamer

Core 4...

Migration Engine

RapidIODisk I/O

8 nodeletsper node

64 nodelets

per Chick

RapidIO

StationaryCore

• Multithreaded multicore

• Memory-side “processor” foroperations innarrow-channel DRAM

• Stationary core for OS

• Threads migrate inhardware on reads!

• Optimize for weak locality

Rogues Gallery — 1 Mar 2019 6/23

Page 7: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

3D Stacked Memory and FPGAs

FPGAs enable flexible“near-memory” research for bothhardware and programmingmodels:• Micron/Pico EX700 & HMC• Nallatech 385s (Arria 10)• Intel Arria10 DevKit• Nallatech 520N (Stratix 10)• Xilinx MpSOC

Rogues Gallery — 1 Mar 2019 7/23

Page 8: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Neuromorphic Systems

• Field-Programmable Analog Array (FPAA) System-OnChip, designed in the lab of Dr. Jennifer Hasler.

• Analog arrays can achieve unprecedented power andsize reductions.

FPAA “driven” by a RPi Neuromorphic Workshop,27 Apr 2018

Rogues Gallery — 1 Mar 2019 8/23

Page 9: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Flexible Infrastructure

login /notebook

rg-admSlurm Ctl

toolbox (NFS)

Scheduling,Tools, and Admin

Key:

Schedulable Resource

Physical Resource

VM

USB device

User Resources

fpaa-host

power-hostnvidia-tegra-N

nvidia-tegra-1

fpaa-dev

rg-dbSlurm DBD

emu-dev emu-chick

..Nfpga-dev-1

fpga-hmcfpga-intel

Powell, Riedy, Young, and Conte. “Wrangling Rogues: ManagingExperimental Post-Moore Architectures.”

https://arxiv.org/abs/1808.06334

• Available. Plans tointegrate with NSFXSEDE.• Scheduler beingdeployed.• IncorporatesSingularity and virtualmachines forOS/library versioning.• Fun tools: usbip forFPAA... Analog inputs?

Rogues Gallery — 1 Mar 2019 9/23

Page 10: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Rogues Gallery Press

On The Next Platform, a partner of The Register:

https://www.nextplatform.com/2018/08/27/a-rogues-gallery-of-post-moores-law-options/

Rogues Gallery — 1 Mar 2019 10/23

Page 11: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Neuromorphic Workshop - April 2018

FPAA-focused workshop led by Dr.Hasler introduced the widercommunity to FPAA programmingfor neuromorphic systems.• Tutorial-style class• Lightning talks from GT andDoE researchers

• http://crnch.gatech.edu/neuro-workshop18

Rogues Gallery — 1 Mar 2019 11/23

Page 12: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Rogues Gallery ASPLOS Tutorial

Programming Novel Architectures in the Post-Moore Erawith The Rogues Gallery

https://crnch-rg.gitlab.io/rg/asplos-2019/

• To be held at ASPLOS 2019 in Providence, RI.• 14 April 2019, 8.30am – 12.00pm• Architecture detailed descriptions• Hands-on with the Emu Chick and (hopefully) theFPAA

Rogues Gallery — 1 Mar 2019 12/23

Page 13: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Rogues Gallery VIP Team

Rogues Gallery VIP team has started in Spring 2019. Thiscourse allows undergraduates to get research experience

working with software for novel architectures.http://www.vip.gatech.edu/teams/

new-team-rogues-galleryRogues Gallery — 1 Mar 2019 13/23

Page 14: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: Emu Pointer-Chasing BenchmarkData-dependent loads, fine-grained access1

Ordered

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Intra-block shuffle: weak locality

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Full block shuffle: weak locality

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Vuduc, Riedy. “An Initial Characterization of the EmuChick,” Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018.

Rogues Gallery — 1 Mar 2019 14/23

Page 15: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: x86 Pointer-Chasing Benchmark

1 4 16 64 256 1K 4K 16K

64K

256K 1M 4M

Block size (number of 16B elements)

0

20

40

60

80

100M

emor

y ba

ndwi

dth

(GB\

s)peak STREAM bandwidth

56 threads

1 4 16 64 256 1K 4K 16K

64K

256K 1M 4M

Block size (number of 16B elements)

peak STREAM bandwidth112 threads

block_shuffle intra_block_shuffle full_block_shuffle

Haswell results, every pattern is different.2

2 Eric Hein, Young, Srinivas Eswar, Jiajia Li, Patrick Lavin, Riedy, Vuduc, Conte. “A Microbenchmark Characterizationof the Emu Chick.” https://arxiv.org/abs/1809.07696

Rogues Gallery — 1 Mar 2019 15/23

Page 16: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: Emu Pointer-Chasing Benchmark

1 4 16 64 256 1K 4K 16K

64K

256K 1M 4M

Block size (number of 16B elements)

0

2

4

6

8

10

12

Mem

ory

band

widt

h (G

B\s) peak STREAM bandwidth

2048 threads

1 4 16 64 256 1K 4K 16K

64K

256K 1M 4M

Block size (number of 16B elements)

peak STREAM bandwidth4096 threads

block_shuffle intra_block_shuffle full_block_shuffle

Mostly flat performance, high utilization.2

Rogues Gallery — 1 Mar 2019 16/23

Page 17: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: BFS on a Dynamic Data Structure

15 16 17 18 19 20 21scale

0

20

40

60

80

100

MTE

PS

Emu single node - CilkEmu multi-node - Cilk

x86 Haswell - STINGERx86 Haswell - Cilk

0

500

1000

1500

Edge

Ban

dwid

th (M

B/s)

Note: Streaming data structure, not statically optimized. 3

3 Hein, Eswar, Abdurrahman Yasar, Prasanth Chatarasi, Li, Young, Conte, Ümit Çatalyürek, Vuduc, Riedy, Bora Uçar.“Programming Strategies for Irregular Algorithms on the Emu Chick.” https://arxiv.org/abs/1901.02775

Rogues Gallery — 1 Mar 2019 17/23

Page 18: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: Labeled Subgraph Alignment

1 2 4 8 16 32 64 128Number of Threads

0

10

20

30

40

50

Sp

eed

up

Multi-BLK

Multi-HCB

Single-BLK

Single-HCB

gsaNA, the first parallel algorithm, strong scaling on DBLPgraph (2048 vertices)3

Rogues Gallery — 1 Mar 2019 18/23

Page 19: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Selected Results: FPGAs

Hadidi, Asgari, Young, Mudassar, Garg, Krishna, Kim. “PerformanceImplications of NoCs on 3D-Stacked Memories: Insights from the

Hybrid Memory Cube (HMC),” ISPASS 2018

• Characterizationswith FPGA and HybridMemory Cube showlatency/bandwidthtradeoff.• Other FPGA work isfocused on compilers,HPC prototyping, andsparse algorithms forIntel and Xilinx FPGAS.

Rogues Gallery — 1 Mar 2019 19/23

Page 20: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Lessons Learned, Emu Chick i

• Finding appropriate metrics is difficult, e.g. for theEmu:

• Comparing ASICs (e.g. x86) to FPGA-based prototypescan be unfair either way.

• Fraction of peak bandwidth for the idealizedproblem?

• SpMV: FLOP/s ∝ BW, level 2 sparse BLAS op.• Graph500 BFS: TEPS ∝ BW

Rogues Gallery — 1 Mar 2019 20/23

Page 21: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Lessons Learned, Emu Chick ii

• Distilling observations on architecture↔

programming model:• Program data location for load (BW) balance.• Remote memory operations v. migration exposes thearchitecture.

• Migrations cost more than it appears. Computation?• Stack spills/access can cause ping-ponging.• How does HW support for top-down (Cilk-ish) affectbottom-up (UPC/SHMEM) PGAS programming?

• Memory allocation similar to UPC, SHMEM• UPC++ rpc_ff v. Emu thread migration?

Rogues Gallery — 1 Mar 2019 21/23

Page 22: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

AcknowledgmentsFantastic students and colleagues:

• Srinivas Eswar (GT CSE)• Dr. Eric Hein (GT ECE⇒ Emu)• Patrick Lavin (GT CSE)• Dr. Jiajia Li (GT CSE⇒ PNNL)• Abdurrahman Yaşar (GT CSE)

• Dr. Ümit Çatalürek (GT CSE)• Dr. Tom Conte (GT CS/ECE)• Dr. Bora Uçar (ENS Lyon CNRS)• Dr. Rich Vuduc (GT CSE)• Dr. Jeffrey S. Young (GT CS)

Code (ideally will have links from crnch.gatech.edu):

• https://gitlab.com/crnch-rg (soon)• https://github.com/ehein6/emu-microbench

Other testbeds:

• ORNL: ExCL• PNNL: CENATE

• Argonne• Sandia

• Berkeley: AQCT• (others?)

Rogues Gallery — 1 Mar 2019 22/23

Page 23: Novel Architectures for Applications in Data Science and ... · Data Science and Beyond Jason Riedy, Jeffrey Young, Tom Conte Center for Research into Novel Computing Hierarchies

Rogues Gallery: Active and Growing• Added FPGA resources, integrating FPAAs• Tight development loop with Emu• Active research projects and publications• Community outreach and education underway• One GT student has made the leap to industryalready...

CRNCH Rogues Gallery connects researchers andstudents with novel architectures and architects with

upcoming applications.

Let us host / manage your neat stuff!http://crnch.gatech.edu/rogues-gallery

Rogues Gallery — 1 Mar 2019 23/23