© Copyright Khronos Group 2013 - Page 1
Khronos Overview The State of the Art in Open
Standards for Visual Computing Neil Trevett
Khronos President Vice President Mobile Content, NVIDIA
© Copyright Khronos Group 2013 - Page 2
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for
advanced hardware acceleration
Low level silicon to software interfaces needed on every platform
Graphics, video, audio, compute,
vision, sensor and camera processing
Defines the forward looking roadmap for
the silicon community
Shipping on billions of devices across
multiple operating systems
Rigorous conformance tests for
cross-vendor consistency
Khronos is OPEN for any company to
join and participate
Acceleration APIs BY the Industry
FOR the Industry
© Copyright Khronos Group 2013 - Page 3
Making a Difference – One API at a Time
Well over 1 BILLION people are using what
the Khronos members have created
together - Every Day…
© Copyright Khronos Group 2013 - Page 4
Khronos Standards
Visual Computing - Object and Terrain Visualization - Advanced scene construction
3D Asset Handling - Advanced Authoring pipelines
- 3D Asset Transmission Format with streaming and compression
Acceleration in the Browser - WebGL for 3D in browsers
- WebCL – Heterogeneous Computing for the web
Camera
Control API
OpenCL 2.0 Finalized!
glTF cooperation with MPEG
for 3D Asset Compression!
OpenVX 1.0
Provisional
Released!
Sensor Processing - Mobile Vision Acceleration - On-device Sensor Fusion
WebGL and WebCL
Momentum!
Over 100 companies defining royalty-free
APIs to connect software to silicon
© Copyright Khronos Group 2013 - Page 5
OpenCL Milestones • 24 month cadence for major OpenCL 2.0 update
- Slightly longer than 18 month cadence between versions of OpenCL 1.X
• Significant feedback from the developer community on Provisional Specification
- Many suggestions were incorporated into the final 2.0 specification
- Other feedback will be considered for future specification versions
OpenCL 1.0 released. Conformance tests
released Dec08
Dec08
Jun10
OpenCL 1.1 Specification and conformance tests
released
Nov11
OpenCL 1.2 Specification and conformance tests
released
OpenCL 2.0 Specification finalized
and conformance tests released
Jul13
OpenCL 2.0 Provisional Specification
released for public review
Nov13
© Copyright Khronos Group 2013 - Page 6
Key OpenCL 2.0 Features • Shared Virtual Memory
- Host and device kernels can directly share complex, pointer-containing data
structures such as trees and linked lists, providing significant programming
flexibility and eliminating costly data transfers between host and devices
• Nested Parallelism
- Device kernels can enqueue kernels to the same device with no host interaction,
enabling flexible work scheduling paradigms and avoiding the need to transfer
execution control and data between the device and host, often significantly
offloading host processor bottlenecks
• Generic Address Space
- Functions can be written without specifying a named address space for
arguments, especially useful for those arguments that are declared to be a
pointer to a type, eliminating the need for multiple functions to be written for
each named address space used in an application
© Copyright Khronos Group 2013 - Page 7
Broad OpenCL Implementer Adoption • Multiple conformant implementations shipping on desktop and mobile
- For CPUs and GPUs on multiple OS
• Android ICD extension released in latest extension specification
- OpenCL implementations can be discovered and loaded as a shared object
• Multiple implementations shipping in Android NDK
- ARM, Imagination, Vivante, Qualcomm, Samsung …
© Copyright Khronos Group 2013 - Page 8
OpenCL as Parallel Compute Foundation • 100+ tool chains and languages leveraging OpenCL
- Heterogeneous solutions emerging for the most popular programming languages
C++
syntax/compiler
extensions
OpenCL HLM
JavaScript binding to
OpenCL for initiation
of OpenCL C kernels
WebCL River Trail
Language
extensions to
JavaScript
C++ AMP
Shevlin Park
Uses Clang
and LLVM
OpenCL provides vendor optimized,
cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan
High level
language for GPU
programming
Compiler
directives for
Fortran C and C++
Aparapi
Java language
extensions for
parallelism
PyOpenCL
Python wrapper
around
OpenCL
© Copyright Khronos Group 2013 - Page 9
Widespread Developers Leveraging OpenCL • Broad uptake of OpenCL in commercial applications
- For desktop and increasingly mobile apps
• “OpenCL” on Sourceforge, Github, Google Code, BitBucket
finds over 2,000 projects
- x264
- Handbrake
- FFMPEG
- JPEG
- VLC
- OpenCV
- GIMP
- ImageMagick
- IrfanView
- Hadoop, Memcched
- Aparapi – A parallel API (for Java)
- Bolt – a Unified Heterogeneous Library
- Sumatra – next generation of compute enabled Java
- WinZip
- Crypto++
- Bullet physics library
- Etc. Etc.
© Copyright Khronos Group 2013 - Page 10
OpenCL Academic Traction • OpenCL at over 100 Universities Worldwide
Teaching multi-faceted programming courses
- Research with top-tier Universities globally
• Complete University Kits available
- Presentation w/instructor & speaker notes
- Example code, & sample application
• Growing textbook ecosystem
- US, Japan, Europe, China and India
• Number of papers referencing OpenCL on
Google Scholar is growing rapidly
- Over 2000 papers in 2012
• Commercial OpenCL training courses - http://www.accelereyes.com/services/training
http://developer.amd.com/Resources/library/Pages/default.aspx
© Copyright Khronos Group 2013 - Page 11
Leveraging Proven Native APIs into HTML5 • Khronos and W3C liaison
- Leverage proven native API investments into the Web
- Fast API development and deployment
- Designed by the hardware community
- Familiar foundation reduces developer learning curve
Native APIs shipping
or Khronos working group
JavaScript API shipping,
acceleration being developed
or work underway
WebVX? Vision
Processing
WebCAM(!) Camera
control and
video
processing
Possible future
JavaScript APIs or
acceleration
WebStream? Sensor Fusion
Native
JavaScript Canvas
Path Rendering
Camera
Control
HTML
© Copyright Khronos Group 2013 - Page 12
Mobile Web is a Real Time Application
Buttery smooth touch interaction needs continuous
60Hz updates
Apple
iPhone
320x480
153K Pixels
163 DPI
Apple
iPad
1024x768
786K Pixels
132 DPI
2048x1536
3100K
Pixels
326 DPI
Apple
iPad Mini
In 5 years the number of
pixels to process on
mobile screens has gone
up by factor of TWENTY
+ =
Need GPU Acceleration for everything Web!
© Copyright Khronos Group 2013 - Page 13
WebGL Availability in Browsers
- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Apple - WebGL must be explicitly turned on MAC Safari and only exposed on iOS for iAds - Chrome OS - WebGL is the only cross-platform API to program the GPU - Google IO announcement - Chrome on Android will soon launch with WebGL
Much WebGL content uses three.js library:
http://threejs.org/
© Copyright Khronos Group 2013 - Page 14
Microsoft PhotoSynth2 • Demonstrated at Build 2013
http://channel9.msdn.com/Events/Build/2013/4-072 1:50
© Copyright Khronos Group 2013 - Page 15
C/C++
SDK Dalvik (Java)
Objective C C#
DirectX
HTML/CSS HTML/CSS HTML/CSS
Cross-OS Portability
HTML5 provides cross
platform portability. GPU
accessibility through
WebGL available soon on
~90% mobile systems
Preferred development
environments not
designed for portability
Native code is portable-
but apps must cope with
different available APIs
and libraries
© Copyright Khronos Group 2013 - Page 16
OpenGL 3D API Family Tree
OpenGL ES 1.0
OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1
OpenGL 3.0
OpenGL 3.1
OpenGL 3.2
OpenGL 3.3
OpenGL 4.0
OpenGL 4.1
OpenGL 4.2
2002
OpenGL 1.3
ES-Next
GL-Next
OpenGL ES 2.0
Content OpenGL ES 1.1
Content
OpenGL ES 3.0
Content
ES3 is backward compatible
so new features can be
added incrementally Fixed function
3D Pipeline
Programmable vertex
and fragment shaders
WebGL 1.0
OpenGL 4.4 is a
superset of DX11
WebGL 2.0
Desktop 3D
Mobile 3D
OpenGL 4.4
WebGL 2.0 is in development now -
will bring OpenGL ES 3.0
functionality to the Web http://www.khronos.org/webgl/public-mailing-list/
http://www.khronos.org/registry/webgl/specs/latest/
http://www.khronos.org/webgl/wiki/Testing/Conformance
© Copyright Khronos Group 2013 - Page 17
OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power
- Incorporates proven features from OpenGL 3.3 / 4.x
- 32-bit integers and floats in shader programs
- NPOT, 3D textures, depth textures, texture arrays
- Multiple Render Targets for deferred rendering, Occlusion Queries
- Instanced Rendering, Transform Feedback …
• Make life better for the programmer
- Tighter requirements for supported features to reduce implementation variability
• Backward compatible with OpenGL ES 2.0
- OpenGL ES 2.0 apps continue to run unmodified
• Standardized Texture Compression
- #1 developer request!
© Copyright Khronos Group 2013 - Page 18
3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential
- Mobile and connected devices need access to increasingly large asset databases
• 3D is the last media type to define a compressed format
- 3D is more complex – diverse asset types and use cases
• Needs to be royalty-free
- Avoid an ‘internet video codec war’ scenario
• Eventually enable hardware implementations of successful codecs
- High-performance and low power – but pragmatic adoption strategy is key
Audio Video Images 3D
MP3 H.264 JPEG ? !
An effective and widely adopted codec ignites previously
unimagined opportunities for a media type
© Copyright Khronos Group 2013 - Page 19
glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets
- Reduce network bandwidth and minimize client processing overhead
• Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR
- Can be used by any app or run-time – usually WebGL accelerated
• Scalable to handle compression and streaming
- Though baseline format does not include compression
• ‘Direct load efficiency’ for WebGL
- Little or NO processing to drop glTF data into WebGL client
• Carry conditioned data from any authoring format
- Prototyping and optimizing efficient handling of COLLADA assets
A standards-based
content pipeline for
rich native and Web 3D
applications Playback Authoring
© Copyright Khronos Group 2013 - Page 20
COLLADA and glTF Open Source Ecosystem
Tool Interop
Three.js glTF Importer. Rest3D initiative
COLLADA2GLTF
Translator
OpenCOLLADA
Importer/Exporter
and COLLADA
Conformance Tests
On GitHUB
Pervasive WebGL deployment
Other
authoring
formats
Web-based Tools
https://github.com/KhronosGroup/glTF
https://github.com/KhronosGroup/OpenCOLLADA
https://github.com/KhronosGroup/COLLADA-CTS
© Copyright Khronos Group 2013 - Page 21
WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF
- Baseline is GZIP
• Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC
- Royalty-free graphics compression technology from MPEG (MIT License)
- Open3DGC is efficient JavaScript and C/C++ implementation
- Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations
- https://github.com/amd/rest3d/tree/master/server/o3dgc
• WebGL-loader is Google lightweight compression for WebGL content
• OpenCTM uses LZMA compression
© Copyright Khronos Group 2013 - Page 22
Initial Compression Results • Compression Efficiency
- Gzip (default level=6)
- OpenCTM (default settings)
- Open3DGC and Webgl-loader - Positions on 14 bits
- Normals and texCoords on 10 bits
Open3DGC is 5x-9x more efficient than Gzip
1.3x-2.4x more efficient than OpenCTM and
1.2x-1.5x more efficient than webgl-loader
0
100
200
300
400
CAD(3748 models)
3D Scanned(78 models)
MPEG dataset(1211 models)
Size
(M
Byt
es)
Gzip
OpenCTM
Webgl-loader + Gzip
Open3DGC-ASCII + Gzip
Open3DGC-Binary
© Copyright Khronos Group 2013 - Page 23
OpenVX – Power Efficient Vision Processing • Acceleration API for real-time vision
- Focus on mobile and embedded systems
• Diversity of efficient implementations
- From programmable processors, through
GPUs to dedicated hardware pipelines
• Tightly specified API with conformance
- Portable, production-grade vision functions
• Complementary to OpenCV
- Which is great for prototyping
Open source sample
implementation
Hardware vendor
implementations
OpenCV open
source library
Other higher-level
CV libraries
Application
Acceleration for power-efficient
vision processing
© Copyright Khronos Group 2013 - Page 24
OpenVX Graphs • Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
- Tiling extension enables user nodes (extensions) to also run in local memory
• VXU Utility Library for access to single nodes
- Easy way to start using OpenVX
• EGLStreams can provide data and event interop with other APIs
- BUT use of other Khronos APIs are not mandated
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
Heterogeneous
Processing
Native
Camera
Control
Example Graph and Flow
© Copyright Khronos Group 2013 - Page 25
OpenVX 1.0 Function Overview • Core data structures
- Images and Image Pyramids
- Processing Graphs, Kernels, Parameters
• Image Processing
- Arithmetic, Logical, and statistical operations
- Multichannel Color and BitDepth Extraction and Conversion
- 2D Filtering and Morphological operations
- Image Resizing and Warping
• Core Computer Vision
- Pyramid computation
- Integral Image computation
• Feature Extraction and Tracking
- Histogram Computation and Equalization
- Canny Edge Detection
- Harris and FAST Corner detection
- Sparse Optical Flow
© Copyright Khronos Group 2013 - Page 26
OpenVX Participants and Timeline • Aiming for specification finalization by mid-2014
• Itseez is working group chair
• Qualcomm and TI are specification editors
© Copyright Khronos Group 2013 - Page 27
OpenVX and OpenCV are Complementary
Governance Open Source
Community Driven No formal specification
Formal specification and conformance tests
Implemented by hardware vendors
Scope Very wide
1000s of functions of imaging and vision Multiple camera APIs/interfaces
Tight focus on hardware accelerated functions for mobile vision Use external camera API
Conformance No Conformance testing
Every vendor implements different subset Full conformance test suite / process
Reliable acceleration platform
Use Case Rapid prototyping Production deployment
Efficiency Memory-based architecture
Each operation reads and writes memory Graph-based execution
Optimizable computation, data transfer
Portability APIs can vary depending on processor Hardware abstracted for portability
© Copyright Khronos Group 2013 - Page 28
OpenVX and OpenCL are Complementary
Use Case General Heterogeneous programming Domain targeted - vision processing
Architecture Language-based
– needs online compilation Library-based
- no online compiler required
Target Hardware
‘Exposed’ architected memory model – can impact performance portability
Abstracted node and memory model - diverse implementations can be optimized
for power and performance
Precision Full IEEE floating point mandated Minimal floating point requirements –
optimized for vision operators
Ease of Use Focus on general-purpose math libraries with no built-in vision
functions
Fully implemented vision operators and framework ‘out of the box’
© Copyright Khronos Group 2013 - Page 29
Typical Imaging Pipeline • Pre- and Post-processing can be done on CPU, GPU, DSP…
• ISP controls camera via 3A algorithms
Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
• ISP may be a separate chip or within Application Processor
Pre-processing Image Signal Processor
(ISP)
Post-
processing
CMOS sensor
Color Filter Array
Lens
Bayer RGB/YUV
App
Lens, sensor, aperture control 3A
Need for advanced camera control API: - to drive more flexible app camera control
- over more types of camera sensors
- with tighter integration with the rest of the system
© Copyright Khronos Group 2013 - Page 30
Khronos Camera API • Catalyze camera functionality not available on any current platform
- Open API that aligns with future platform direction for easy adoption
- E.g. could be used to implement future versions of Android Camera HAL
• More detailed control per frame
- Focus, flash, format, Region of Interest (ROI) selection
• Global Timing & Synchronization
- E.g. Between cameras and MEMS sensors
• Application control over ISP processing (including 3A)
- Including multiple, re-entrant ISPs
• Control multiple sensors with synch and alignment
- Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras
• Flexible processing/streaming
- Multiple output streams and streaming rows (not just frames)
- RAW, Bayer and YUV Processing
© Copyright Khronos Group 2013 - Page 31
Camera API Design Philosophy • C-language API starting from proven designs
- e.g. FCAM, Android Camera HAL V3
• Design alignment with widely used hardware standards
- e.g. MIPI CSI
• Focus on mobile, power-limited devices
- But do not preclude other use cases such as automotive, surveillance, DSLR…
• Minimize overlap and maximize interoperability with other Khronos APIs
- But other Khronos APIs are not required
• Provide support for vendor-specific extensions
Apr13
Jul13
Group charter approved
4Q13
Provisional specification
1Q14
First draft specification
2Q14
Sample implementation and
tests
3Q14
Specification ratification
© Copyright Khronos Group 2013 - Page 32
‘Always On’ Camera and Sensor Processing • Visual sensor revolution – driving need for significant vision acceleration
- Multi-sensors: Stereo pairs -> Plenoptic arrays -> Active depth cameras
• Devices should be always environmentally-aware – e.g. ‘wave to wake’
- BUT many sensor use cases consume too much power to actually run 24/7
• Smart use of sensors to trigger levels of processing capability
- ‘Scanners’ - very low power, always on, detect events in the environment
ARM 7 1 MIP and accelerometers can
detect someone in the vicinity
DSP / Hardware Low power activation of camera
to detect someone in field of view
GPU / Hardware Maximum acceleration for processing
full depth sensor capability
© Copyright Khronos Group 2013 - Page 33
Sensor Industry Fragmentation …
© Copyright Khronos Group 2013 - Page 34
StreamInput - Sensor Fusion • Defines access to high-quality fused sensor stream and context changes
- Implementers can optimize and innovate generation of the sensor stream
OS Sensor OS APIs (E.g. Android SensorManager or
iOS CoreMotion)
Low-level native API defines access to
fused sensor data stream and context-awareness
…
Applications
Sensor Sensor
Sensor
Hub Sensor
Hub
StreamInput implementations
compete on sensor stream quality,
reduced power consumption,
environment triggering and context
detection – enabling sensor
subsystem vendors to increased
ADDED VALUE
Middleware (E.g. Augmented Reality engines,
gaming engines)
Platforms can provide
increased access to
improved sensor data stream
– driving faster, deeper
sensor usage by applications
Middleware engines need platform-
portable access to native, low-level
sensor data stream
Mobile or embedded
platforms without sensor
fusion APIs can provide
direct application access
to StreamInput
© Copyright Khronos Group 2013 - Page 35
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video
Composition
On GPU
Audio
Rendering
Application
on CPUs, GPUs
and DSPs
Sensor
Fusion
Vision
Processing
MEMS
Sensors
Camera Control
API
EGLStream - stream data
between APIs
Precision timestamps
on all sensor samples
AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
© Copyright Khronos Group 2013 - Page 36
Khronos DevU In Depth Sessions Today