Wes Showfety
Open Source Database & HPC strategist, North [email protected]
770-617-7377
LinkedIn: https://www.linkedin.com/in/wes-showfety-2399444
Twitter: @Wes_Show
IBM Power User Group - Atlanta
2
Two Major New Product Announcements
Novel Embedded NVLink Interface to Tightly Integrate
with NVIDIA Tesla GPUs
New Chip“POWER8 with NVLink”
New Power LC Linux Servers
3 New Linux Servers Built for Big Data, Analytics, Machine
Learning, and HPC
3
System Details
2-socket, 2U
Up to 20 cores (2.86-3.26Ghz)
1 TB Memory (32 DIMMs)
230GB/sec memory bandwidth
2x SFF (HDD/SSD), SATA
Up to 4 integrated NVIDIA Pascal GPUs
3 PCIe slots, 3 CAPI enabled, IB Add-in
Air or water cooled
New Power Linux Servers
System Details
2-socket, 2U
Up to 20 cores (2.9-3.3Ghz)
512 GB Memory (16 DIMMs)
115GB/sec memory bandwidth
12 SFF/LFF (HDD/SSD) 96 TB storage
5 PCIe slots, 4 CAPI enabled
2 NVIDIA K80 GPU capable
System Details
2 socket, 1U
Up to 20 cores (2.09-2.32Ghz)
512 GB Memory (16 DIMMs)
115 GB/sec memory bandwidth
4 SFF/LFF (HDD/SSD), 32 TB Storage
4 PCIe slots, 3 CAPI enabled
1 NVIDIA K80 GPU capable
S822LC for High
Performance Computing
S822LC for Big Data S821LC
New
POWER8
with NVLink
Processor
New Tesla P100
with NVLink
4
250+ OpenPOWER Foundation Members
Implementation, HPC & Research
Software
System Integration
I/O, Storage & Acceleration
Boards & Systems
Chips & SoCs
5
2300+ Linux Applications on POWER
Big Data & Machine Learning
Cloud Mobile Enterprise
Major Linux Distros
HPC
miniDFT
CTH
BLAST
Bowtie
BWA
FASTA
HMMER
GATK
SOAP3
STAC-A2
SHOC
Graph500
Ilog
CHARMM
GROMACS
NAMD
AMBER
RTM
GAMESS
WRF
HYCOM
HOMME
LES
MiniGhost
AMG2013
OpenFOAM
Emerging technologies drive business transformation
Cloud
60% of banks process
most transactions
in cloud by 2016
Collaboration
60 million US
households conducting
P2P payments
Big Data
2.5 billion gigabytes of
data generated every day
Intelligent/
Connected
Systems
7.9 million in U.S.
adopted NFC e-Wallets
Mobile
35% transaction growth
driven through mobile
annually through 2017
AnalyticsMoving to real time
+7.6% in customer lifetime
value for firms using engagement
analytics
$226B – annual cost of health
care fraud
Security
$5.65 million – average cost
of a security breach in the US
Enterprises must learn to “Innovate”
……TO
FROM……..“The ‘Uber
syndrome’ – where
a competitor with a
completely different
business model
enters your industry
and flattens you.” Judy Lemke, CIO, Schneider,
United States
“52% of the Fortune 500 firms since 2000 are gone.” – R. Ray Wang http://blog.softwareinsider.org/2014/02/18/research-summary-sneak-peaks-from-constellations-futurist-
framework-and-2014-outlook-on-digital-disruption/
What does High Performance Computing mean to you?
It is no longer limited to the Federal Government and Research Universities.
Perception of “HPC” should be changing a bit…
Advanced Business Intelligence, Streaming Analytics, Big Data Visualization, Real-Time Fraud
Detection – Logistics - Inventory Tracking, IoT, Etc…
When you hear these terms, think HPC...
IBM Innovates with POWER8Designed for Big Data
Core, cache and memory design all suited to HPC
Matched to markets where HPC and Big Data Intersect:
• High Performance Data Analytics
• Life Sciences
• Oil and Gas/Seismic Processing
• Machine Learning
• Business Intelligence
• Insights at the speed of thought
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
Cor
e
L2
L3 Cache & Chip Interconnect
8M L3
Region
Mem. Ctrl.Mem. Ctrl.
SM
P L
inks
Ac
ce
lerato
rsS
MP
Lin
ksP
CIe
Massive IO bandwidth
Continuous data load
Superior parallel
processing
Large-scale
memory processing
Processorsflexible, fast execution of
analytics algorithms
Memorylarge, fast workspace to
maximize business insight
Cacheensure continuous data load
for fast responses
How?
11
Tesla P100 accelerator
Compute 5.3 TF DP ∙ 10.6 TF SP ∙ 21.2 TF HP
Memory HBM2: 720 GB/s ∙ 16 GB
Interconnect NVLink (up to 8 way) + PCIe Gen3
ProgrammabilityPage Migration Engine
Unified Memory
Availability Ships in IBM 822LC for HPC System: September 2016
Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine
Highest Compute Performance GPU Interconnect for Maximum Scalability
Unifying Compute & Memory in Single Package
Simple Parallel Programming with Virtually Unlimited Memory Space
Unified Memory
CPU
Tesla P100
Introducing (formerly GIS Federal and
GPUdb) who have the industry’s first GPU-powered full enterprise
database and visualization engine for exploring, analyzing, and
visualizing petabyte-scale data 100 to 1,000x faster than
traditional in-memory data systems.
Originally developed as a geospatial and temporal computational
engine for the United States Army Intelligence and Security
Command (INSCOM), It has the addition of a fully integrated and
rich visualization engine, including native geospatial object type
support.
But its not all about Hardware right ???
What’s the SOLUTION ?
14
Introducing New “POWER8 with NVLink” Chip
First CPU Designed for Accelerated Computing
High Performance Cores
Fast & Large Memory System
Fast PowerAccelInterconnects for
Accelerators
Faster Cores than x86
Larger Caches Per Core than x86
5x Faster Data Communication between
POWER8 & GPUs
CAPINVLink
PCIeP8
POWER8
15
Introducing 822LC Power System for HPCFirst Custom-Built GPU Accelerator Server with NVLink
2.5x Faster CPU-GPU Data Communication via NVLink
NVLink80 GB/s
GPU
P8
GPU GPU
P8
GPU
PCIe32 GB/s
GPU
x86
GPU GPU
x86
GPU
No NVLink between CPU & GPU for x86 Servers: PCIe Bottleneck
NVIDIA P100 Tesla GPU
“Minsky”POWER8 NVLink Server
x86 Servers with PCIe
• Custom-built GPU Accelerator Server• High-Speed NVLink Connections between
CPUs & GPUs and among GPUs• Features novel NVIDIA P100 Tesla GPU
accelerator
16
Detailed Diagram of 822LC for HPC
P100GPU
POWER8CPU
GPUMemory
System Memory
P100GPU
80 GB/s
GPUMemory
NVLink
115 GB/s
P100GPU
POWER8CPU
GPUMemory
System Memory
P100GPU
80 GB/s
GPUMemory
NVLink
115 GB/s
NVLink between CPUs and GPUs
enables fast memory access to
large data sets in system memory
NVLink NVLink 720GB/s
Up to 0.5 TB
16GB HBM2
18
Kinetica is already seeing a three- to four-times performance increases using Power chips with NVLink as well as significant gains in other areas.
Amit VijCEO & Co-FounderKinetica Accelerated Database Company
19
Kinetica: 10x Faster Relational Database w/ Analytics
Retail / Supply Chain Use Case Example• Fuse real-time data from multiple sources
– Point-of-Sale (POS) data
– Distribution centers inventory
– Historical buying patterns
– Demographics
– What’s trending on Twitter
– Weather data
Logistics Use Case Example• Delivery route planning
• Monitor delivery / collection
• Contingency planning (traffic, accidents, employee sickness)
Telcos: Analyze log information from OTA cell phone updates
Many more use cases for Finance, Defense, Healthcare, Ad-tech, Insurance, etc.
High Performance Data Analysis – KineticaUltrafast ingest and analysis of billions of objects using GPUs
Advantage – Performance, Cost, Scale
Logistics – United States Postal Service was billed $100M by Oracle for Exadata and could only get 20% of their 220,000 mail carriers online with real-time Geospatial before it would crash. GPUdb does 100% at 95% less cost.
Retail – Large retailer estimates $3B in lost sales last year due to empty shelves and lost purchase opportunities. They spend $100M on HANA, and could ingest 1B purchase records per hour. With Kinetica they did 4.5-6B records per minute.
21
Performance Leadership Roadmap for HPC / HPDA
2015 2016 2017
POWER8
POWER8 with NVLink
POWER9
OpenPowerCAPI Interface
PowerAccel Interfaces:NVLink, CAPI, PCIeGen3
PowerAccel:Enhanced CAPI, NVLink Next Gen, PCIe Gen4
Connect-IBFDR Infiniband
PCIe Gen3
ConnectX-4EDR Infiniband
CAPI over PCIe Gen3
ConnectX-5Next-Gen Infiniband
Enhanced CAPI over PCIe Gen4
MellanoxInterconnect Technology
IBM CPUs
NVIDIA GPUsKepler
PCIe Gen3Volta
NVLink Next GenPascal &Tesla
NVLink
IBM Nodes
22
After: NoSQL POWER8 + CAPI Flash
Load Balancer
500GB Cache
Node
Backup Nodes
500GB Cache
Node500GB Cache
Node500GB Cache
Node500GB Cache
Server Node
Before: NoSQL In-Memory (x86)
24U 4U
Accelerating NoSQL Databases with CAPI-Attached Flash
POWER8 Server
Flash Array w/ up
to 40TB
CAPI Device
Flash Acts As Extension of System Memory
3x Lower Cost
23
From Technical Computing to Machine Learning
High Performance Computing
Accelerated Databases Machine Learning
Research +
Commercial HPC
Seismic ProcessingReservoir Simulation
Risk AnalyticsOptions Pricing
Scientific ResearchGenomics
10x Faster Databases
Kinetica, BlazeGraph
LogisticsRetail
UtilitiesTelcos
FinanceDefense
Deep Learning
Computer vision, Speech, NLP
Use cases: Retail, Customer Service, Text
Analytics
24
HPC Pre-Sales Centers and Technical Support• PADC centers with IBM, NVIDIA and Mellanox focused on accelerated applications and technical collaborations
• IBM Systems Client Centers• HPC Briefings
• HPC Workshops
• HPC Benchmarks UK Science and Technology Facilities
Council (STFC) PADC
IBM PADC Montpellier joint center with
NVIDIA and Mellanox
IBM PADC Boeblingen joint center with
NVIDIA
IBM Poughkeepsie
POWER HPC
Benchmark Center
For latest HPC information refer to the
IBM Systems Client Centers HPC page
IBM Austin POWER
HPC Executive
Briefing Center
NEW! NVIDIA/IBM
Acceleration Lab
25
IBM Research Paving the Path to Next-Generation HPC
Programming
Models for Exascale
Enhancing Open
Interconnects
POWER
Data Centric System Node, & Processor
Innovations
Scalable High Performance Storage
& File Systems
26
Integrated Clusters: Validated, Tested, Complete
The Power Systems HPC Cluster is a modular solution combines
• High performance compute nodes
• Low latency interconnect fabric
• High performance parallel storage
• System software
In to a single, integrated solution; scale from 5 to 64 compute nodes
Assembled, tested, and provisioned in IBM manufacturing for faster time to compute
Modular and available with or without GPU and FPGA accelerators for compute intensive,
data intensive, or balanced cluster performance