Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Advantages of a Bare-Metal Cloud For GPU Workloads
Karan BattaProduct ManagementOracle Cloud Infrastructure
Our Journey…
Bare-Metal Cloud
Announced at Oracle Open World 2016
Oracle Cloud Infrastructure Rebranding
October 2017
Our Journey…
Metal
Announced at Oracle Open World 2016
Oracle Cloud Infrastructure Rebranding &
Launch October 2017
Available GPU Instances with P100 at Open World 2017
Our Journey…
Oracle Cloud Infrastructure Rebranding
October 2017
Generally Available GPU Instances with P100 at Open World 2017
Oracle Cloud Infrastructure’s
first Super Computing in
Our Architecture
US-Phoenix
AD-1
AD-3
AD-2
US-Ashburn
EU-London
EU-Frankfurt
AD-1
AD-3
AD-2
AD-1
AD-3
AD-2 AD-1
AD-3
AD-2
REGION
Our Architecture
DATACENTERS
AD-1 AD-3AD-2
• Multiple fault-domains• completely independent datacenters• Predictable low latency & high speed, encrypted interconnect between ADs • Enables zero-data-loss architectures (e.g. Oracle MAA) and high availability
scale-out architectures (e.g. Cassandra)
Our Architecture
• Non-oversubscribed network – flat, fast, predictable• Very high scale – ~1 million network ports in an AD• Predictable low latency & high speed interconnect between hosts in an AD• < 100µs expected one-way latency, 2 x 25Gb/s bandwidth
PHYSICAL NETWORK
REGION DATACENTERS
AD-1 AD-3AD-2
Our Architecture
• Highly configurable private overlay networks – moves management and IO out of the hypervisor and enables lower overhead and bare metal instances
VIRTUAL NETWORK
REGION DATACENTERS
AD-1 AD-3AD-2
PHYSICAL NETWORK
Our Architecture
VIRTUAL NETWORK
COMPUTE, STORAGE, DATABASE…Bare-Metal NVMe Storage VMs Exadata Load Balancer
REGION DATACENTERS
AD-1 AD-3AD-2
PHYSICAL NETWORK
OCI HPC Capabilities
Bare Metal Standard 52 Cores, 768 GB RAM,
up to 512 TB Block Storage2x 25Gbe Network Interfaces
Bare Metal DenseIO52 Cores, 768 GB RAM,
51.2 TB of local NVMe SSD2x 25Gbe Network Interfaces
Bare Metal Pascal GPU28 Cores, 192 GB RAM,
2x Tesla P100 GPUs up to 512 TB Block Storage
2x 25Gbe Network InterfacesPre-Configured Images
Block Storage50 GB-2 TB volumes
Up to 25K IOPS per volume400K IOPS per host
File Storage ServiceManaged distributed file service
NFSv3 mount pointPay for what you use
Bare Metal Volta GPU52 Cores, 768 GB RAM,
8x Tesla V100 GPUsNVLINK Interconnect
up to 512 TB Block Storage2x 25Gbe Network Interfaces
Pre-Configured Images
Available Today
GPU Visualization12 cores
256 GB RAMVM or BM GPU InstancesNVIDIA Quadro EnabledTeradici & Citrix Support
PreviewToday
Tesla Volta on OCI
• Generally Available today in US-Ashburn Region
• Bare-Metal Instance with 8x Tesla V100 (SXM2) GPUs all interconnected with NVLINK
• Virtual Machines with 1, 2 or 4 GPUs in an instance coming over the next few weeks
• Uses HGX-1 Open Compute Platform Design as a reference architecture
Instance OfferingsShape Cores Memory GPUs Network Storage Cost
BM.GPU3.8 52 768 GB 8x V100 2x 25Gbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.4 24 360 GB 4x V100 1x 25Gbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.2 12 180 GB 2x V100 800 Mbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.1 6 90 GB 1x V100 400 Mbps Up-to 512TB of Block $2.25/GPU/hr
Instance OfferingsShape Cores Memory GPUs Network Storage Cost
BM.GPU3.8 52 768 GB 8x V100 2x 25Gbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.4 24 360 GB 4x V100 1x 25Gbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.2 12 180 GB 2x V100 800 Mbps Up-to 512TB of Block $2.25/GPU/hr
VM.GPU3.1 6 90 GB 1x V100 400 Mbps Up-to 512TB of Block $2.25/GPU/hr
BM.GPU2.2 12 192 GB 2x P100 2x 25Gbps Up-to 512TB of Block $1.25/GPU/hr
VM.GPU2.1 28 104 GB 1x P100 25Gbps Up-to 512TB of Block $1.25/GPU/hr
NGC Deployment
• Limited Availability
• Pascal & Volta Instance Shapes supported
• Deploy Deep Learning Frameworks, HPC Applications and Visualization Applications seamlessly
• Pre-Configured NGC Image available to deploy on OCI
• Flexibility to run dev/test on Virtual Machine GPU Instances and run production workloads on Bare-Metal Instances
https://cloud.oracle.com/iaas/gpu
NVIDIA GRID on OCI
• Limited Availability
• vDWS (Virtual Datacenter Workstation) on Pascal or Volta based GPU Instances
• Use Windows Server 2012/2016 or any other Linux Distribution
• Citrix HDX 3D Pro supported
• Teradici Cloud Access Software Supported
Workload time reduced by hours!
0.860.9 0.88
0.930.89
0.82
0
0.2
0.4
0.6
0.8
1
1.2
LS-DYNA ANSYS Fluent MILC WRF HPL Stream
Bare-Metal Matters for HPC
OCI Bare-Metal Performance vs Other Public Cloud Provider VMs
BM VM
<1%of unstructured data
is analyzed or used at all
<50%of structured data is actively used in making decisions
80% of analysts’ time
is spent discovering and preparing data
>70%of employees have access to data they
should NOT
Big Data Challenges
Data Volume Growth
Cost of compute
Data volume/
Cheap Storage
Time
Machine/DeepLearning
NOMachineLearning
1950s 1960s 1970s 1980s 1990s 2000s 2010s
On Premises Big Data Analytics Challenges
Scale infrastructure as demand grows
Pay Only for what you use
Get access to the latest hardware on-
demand
• (High Memory, GPUs.,)
Avoid Large CapEx Spend
Our Strategy for Big Data Analytics on the Cloud
High Performance/Low Cost Infrastructure Offering
Big Data/Analytics/Data Management ISV Ecosystem
Big Data/Analytics Native Cloud Services offering
OCI Storage Options for Big Data Applications
• Guaranteed Performance with SLA
• Enterprise grade features – clones etc.,
HDFS overBlock Storage
• Low cost long term storage
• Scale compute independent of storage
• Ease of Data Sharing
HDFS overObject Store
• Lowest cost offering
• Colocation of compute & storage
• Highest Performance with Guaranteed SLA
HDFS over Local NVME Storage
Data Lake in the Cloud
• With the cloud, we can separate compute and storage
− Easy to share data between different clusters / applications
− On-demand cluster deployment for different workloads and tenants
33
Object Store
Data Lake
C C CC
Data Exploration Spark Cluster
C C CC
Hive Workload
Streaming Workload
C C CC
The Industry’s First End-to-End SLA
PERFORMANCE Covered No coverage No coverage
MANAGEABILITY Covered No coverage No coverage
AVAILABILITY Covered Covered Covered
OCI Differentiators for Big Data Applications on IaaS
Lower TCOBare Metal
Servers
Fault Tolerant
Architectures
Superior Storage Offering
No Vendor Lock in options
Pay only for what you use
Building Big Data Application
Data Integration/Streaming
Data Lake
Data Analytics/Filtering
Processed Data Storage/Sharing
Data Visualization/Dashboards
AI/Machine Learning
Data Management/Big Data ISV Solution Partner Ecosystem Oracle Cloud Infrastructure
Data Management / Big Data DevOps
Optimized & Certified Terraform Template solutions for every partner
Come Partner with us!
Qubole on Oracle Cloud Infrastructure
Confidential – Oracle Internal
Simple
• A complete data platform solution• No need to manage infrastructure• Self-service data access across the
enterprise
Agile and Fast
• Spark and Hadoop clusters in minutes• Builds on Oracle Cloud Infrastructure
performance advantages • Get business insights faster
Cost
• Stand up your Spark or Hadoop infrastructure at a fraction of the cost
• Reduce operation and management cost
Qubole is a Turnkey
Big Data Service on
Oracle Cloud Infrastructure
S3 | ADLS | HDFS | KUDU
Cloudera Enterprise on Oracle Cloud
40CONFIDENTIAL— RESTRICTED
The modern platform for machine learning and analytics optimized for
the cloud
EXTENSIBLE SERVICES
CORE SERVICESDATA
ENGINEERINGOPERATIONAL
DATABASEANALYTIC DATABASE
DATA CATALOG
INGEST & REPLICATION
SECURITY GOVERNANCEWORKLOAD
MANAGEMENT
DATA SCIENCE
SHARED DATAEXPERIENCE
SHARED STORAGE
Customer:Oracle Data Cloud on OCI• Cloud-based data management platform
and 3rd party marketplace for top marketers
• Platform used by top retailers, banks, and tech companies
• Moved significant infrastructure to Oracle Cloud Infrastructure (OCI)
• 30 billion API calls per day
• 9 billion global profiles
• 7.5 trillion data points collected monthly
• 50,000 categories
• Pipeline aggregates & summarizes logs: 300TB read, 150TB write per day
Recap
Limited Availability of NVIDIA GPU CLOUD for easy deployment of ML & HPC Applications
NVIDIA Tesla Volta GPUs Available Today in US Regions in Bare-Metal Shapes.
Recap
Limited Availability of NVIDIA GPU CLOUD for easy deployment of ML & HPC Applications
GPUs Available Today in
Limited Availability of NVIDIA GRID OCI GPU Instances with CITRIX & Teradici
Recap
Limited Availability of NVIDIA GPU CLOUD for easy deployment of
Limited Availability of NVIDIA GRID vDWS on OCI GPU Instances with CITRIX & Teradici
QUESTIONS?
Limited Availability of on
OCI GPU Instances with
[email protected]@Karan_Batta
[email protected]@_cloudguy
https://oracle.cloud.com/iaas/gpuOver 100+ hours of GPU for free!