Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
7 October 2016 | Seoul
NAUTOMOTIVE ROADMAP
2
NVIDIA AUTOMOTIVE
10M CARS ON THE ROAD
20+ BRANDS
100+ MODELS
3
SELF-DRIVING CARS ARE AN AI CHALLENGE
PERCEPTION AI PERCEPTION AI LOCALIZATION DRIVING AI
DEEP LEARNING
4
Training on DGX-1
Driving with DriveWorks
KALDI
LOCALIZATION
MAPPING
DRIVENET
NVIDIA DGX-1 NVIDIA DRIVE PX 2
NVIDIA AI SYSTEM FOR AUTONOMOUS DRIVING
5
Engineered for Deep Learning | 170TF FP16 | 8x Tesla P100
NVLink Hybrid Cube Mesh | Accelerates Major AI Frameworks
NVIDIA DGX-1 WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
6
DRIVE PLATFORMS
Full Autonomy
AutoChauffeur
AutoCruise
7
NVIDIA DRIVE PX 2 AUTOCRUISE
10W AI Car Computer | Passive Cooling | Automotive IO
Multiple Cameras & Sensors | DriveWorks SW/SDK
AI Highway Driving | Localization & HD Mapping
Tegra Parker SoC — 1.3 TFLOPS, 6 CPU Cores, Integrated ISP
8
Scalable from 1 to 4 Processors to Multiple DRIVE PX 2s
2 Parker + 2 Pascal GPU, 20 TOPS DL, 120 SPECINT, 80W
Up to 12 Cameras; plus LIDAR, Radar, Ultrasonic sensors
DriveWorks SW/SDK | AI Perception | Localization & Mapping
NVIDIA DRIVE PX 2 AUTOCHAUFFEUR & FULLY AUTONOMOUS
9
INTRODUCING XAVIER AI SUPERCOMPUTER SOC
7 Billion Transistors 16nm FF
8 Core Custom ARM64 CPU
512 Core Volta GPU
New Computer Vision Accelerator
Dual 8K HDR Video Processors
Designed for ASIL C Functional Safety
7 October 2016 | Seoul
DEEP LEARNING FOR INTELLIGENT MACHINES
11
APPLICATIONS OF DEEP LEARNING Image Classification, Object
Detection, Localization Face Recognition Speech & Natural Language
Processing
Medical Imaging & Interpretation
Seismic Imaging & Interpretation Recommendation
12
NVIDIA GPU
AMAZING RATE OF IMPROVEMENT
Pedestrian Detection
CALTECH
CV-based DNN-based
Object Detection
KITTI
Image Recognition
IMAGENET
Top Score
NVIDIA DRIVENet
13
AI FOR THE AGRICULTURAL AND INDUSTRIAL SUPPLY CHAIN
Intelligent Factory
Pick and place
Complex/custom tasks
Visual inspection
Task consolidation
Dynamic reconfiguration
Collaborative robotics
Efficiency optimization
Factory simulation
Smart Operations
Infrastructure inspection
Predictive maintenance
Physical security
Logistics
Autopilot/self-driving trucks
Robot/drone delivery and support
Intelligent Warehouse
Inventory management
Bin picking
Pallet movement
Mining and Agriculture
Equipment automation
Operational safety
14
COMMERCIAL/INDUSTRIAL UAVS
Logistics
Warehouse automation
Package delivery
Inspection
Wind turbines
Bridges
Oil rigs
Pipelines
High-voltage power lines
Cell towers
Precision Agriculture
Planting
Spraying
Security
Enterprise security
Ad hoc security systems
Emergency Response
First Responder
Search and Rescue
15
Jetson TX1 AI Computer on a Module
Advanced tech for intelligent machines
Unmatched performance under 10W
Smaller than a credit card
16
JETSON TX1
GPU 1 TFLOP/s 256-core Maxwell
CPU 64-bit ARM A57 CPUs
Memory 4 GB LPDDR4 | 25.6 GB/s
Storage 16 GB eMMC
Wifi/BT 802.11 2x2 ac/BT Ready
Networking 1 Gigabit Ethernet
Size 50mm x 87mm
Interface 400 pin board-to-board connector
17
NVIDIA JETPACK 2.3
• TensorRT: Up to 2x performance for Deep Learning inference
• cuDNN: The most flexible library for Deep Learning
• CUDA 8.0: Improved performance and EGL interoperability
• Multimedia API: Encode/decode, scaling, color space conversion, camera imaging and ISP support
• Linux4Tegra 24.2: Complete systems software
SDK for embedded AI
18
UP TO 2X DEEP LEARNING PERF AND EFFICIENCY
Footnotes
• The efficiency was measured using the methodology outlined
in the whitepaper:
https://www.nvidia.com/content/tegra/embedded-
systems/pdf/jetson_tx1_whitepaper.pdf
• Jetson TX1 efficiency is measured at GPU frequency of 691
MHz.
• Intel Core i7-6700k efficiency was measured for 4 GHz CPU
clock.
• GoogLeNet batch size was limited to 64; that is the maximum
supported by Jetpack 2.0. With Jetpack 2.3 and TensorRT ,
GoogLeNet batch size 128 is also supported for higher perf.
• FP16 results for Jetson TX1 are comparable to FP32 results
for Intel Core i7-6700k as FP16 incurs no classification
accuracy loss over FP32.
• Latest publicly available software versions of IntelCaffe and
MKL2017 beta were used.
• For Jetpack 2.0 and Intel Core i7, non-zero data was used for
both weights and input images. For Jetpack 2.3 (TensorRT)
real images and weights were used.
19
Jetson TX1 Developer Kit
Jetson TX1
Developer Board
5MP Camera
20
Jetpack SDK
Libraries
Developer tools
Design collateral
Developer Forum
Training and Tutorials
Ecosystem
http://developer.nvidia.com/embedded-computing
COMPREHENSIVE DEVELOPER SITE
21
DEEP LEARNING END-TO-END
Train
Step 1: Train
Optimize Deploy
NVIDIA DGX-1: Train your model with NVIDIA DIGITS Software on DGX-1, this highest performance training solution for DNNs
TensorRT
Dramatically speed up/reduce memory usage for your model
Jetson TX1: Deploy your model to your fleet of Jetson TX1- enabled products. Jetson TX1 is the highest performance inference solution under 10W
22
Jetson TX1 Module & Developer Kit
Available now from our
regional partners
MDS Technology
http://www.embedsolution.com/main/solutions/sol02_10_01.asp
23
A COMPLETE DEEP LEARNING PLATFORM MANAGE TRAIN DEPLOY
DIGITS
DATA CENTER AUTOMOTIVE
TRAIN TEST
MANAGE / AUGMENT EMBEDDED
GPU INFERENCE ENGINE
PROTOTXT
24
May 8 - 11, 2017 | Silicon Valley | #GTC17 www.gputechconf.com
CONNECT
Connect with technology experts from NVIDIA and other leading organizations
LEARN
Gain insight and valuable hands-on training through hundreds of sessions and research posters
DISCOVER
See how GPU technologies are creating amazing breakthroughs in important fields such as deep learning
INNOVATE
Hear about disruptive innovations as early-stage companies and startups present their work
Don’t miss the world’s most important event for GPU developers May 8 – 11, 2017 in Silicon Valley
JOIN THE ACTION! PRESENT A TALK, LAB OR POSTER AT GTC 2017. APPLY AT
WWW.GPUTECHCONF.COM