Autonomous Driving on Benchmarkson-demand.gputechconf.com/gtc/2017/presentation/s7787...• Stixels –Representing the world using matchstick –Distance and 3D shape –Missing the

Autonomous Driving on Benchmarks

Xiaodi Hou

TWO DECADES OF BENCHMARKING

Two decades of benchmarking

• MNIST

– 1998

– Character recognition

– 60,000 images

• Inspired Convolutional Neural Net


• PASCAL-VOC

– 2005

– Object detection & classification

– 3787 images

• Inspired Deformable Part-based Model


• ImageNet

– 2010

– Object classification

– 1,000,000 images

• Inspired deep learning

LIMITATIONS OF BENCHMARKS

Upper bounds of benchmarks

Objective tasks

Intermediate tasks

Subjective tasks

• Measuring physical reality

• Bounded by measurement

accuracy

• Stereo/Optical flow/Face

recognition

• Measuring human cognition

• Bounded by subject

agreement

• Saliency/Memorability/Image

captioning

Imperfect benchmarks

• Marriage market in China

– Tall, rich, and handsome• 80% girls are forced to choose among

– tall poor ugly guy

– short rich ugly guy

– short poor handsome guy

• Dimensionality reduction

– Guaranteed information loss!

– A projection of 𝑹𝒏 → 𝑹

• Red or Blue?

Signs of a fading benchmark

• Saturated competition– Labeled Face in the Wild (0.9978 ± 0.0007)

• Weak transferability– Middlebury Optical Flow → KITTI Optical Flow

• Poor inert-subject consistency– Image captioning and BLEU scores

• A man throwing a frisbee in a park.

• A man holding a frisbee in his hand.

• A man standing in the grass with a frisbee.

BENCHMARKS AND AUTONOMOUS DRIVING

Vision-based autonomous driving benchmarks

• KITTI & CityScapes

– Detection

– Tracking

– Stereo/Flow

– SLAM

– Semantic segmentation

• 100% traditional vision challenges

• Are we ready?

Not yet…

Challenge 1: Data distribution

• Academia

– Average performance

• Silicon valley startup

– Demo oriented

– Best case performance

• Real products

– Murphy’s law

– Worst case performance

Challenge 2: Gruond-truth representation

• Bbox– Almost no bbox in real

world!

– Missing hidden variables (distance & velocity)

• Semantic segmentation– “pixel classification”

– How to assemble all the pixels?

• Stixels– Representing the world

using matchstick

– Distance and 3D shape

– Missing the notion of whole objects

Challenge 3: Structured prior

• What’s wrong with end-to-end learning?

Challenge 3: Structured prior

• Two types of priors:– Implicit prior

• Data driven (e.g. images)

• Good for deep learning models

– Explicit prior• Rule driven (e.g. cars cannot fly)

• Good for probabilistic models

• The road ahead– An image based problem with strong explicit priors

TUSIMPLE CHALLENGES!WORKSHOP@CVPR 2017

TuSimple Challenge 1: Lane challenge

TuSimple Challenge 1: Lane challenge

• Deep learning for lane?– Parametrization of pixels

• Strong structure priors– ~ 3.75m lane width

– Parallel lines

– (almost) flat road surface

• Over-representing corner cases– 20% hard cases (heavy occlusion/strong light condition

change/bad markings) are unlikely to occurs, if sampled uniformly

TuSimple Challenge 2: Velocity estimation

• Representing the world with cam + LiDAR

TuSimple Challenge 2: Velocity estimation

• Object-level representation for motion planning

– Stereo map?

– SLAM?

– Estimation based on bbox size?

• LiDAR vs Camera

– No LiDAR solution for 200m perception

TuSimple challenges

• Video clip based

– We expect non-trivial temporal aggregation!

• Confidence based

– Each entry has a “confidence” field

– We evaluate the most confident 80% entries

• Run-time

– Must report single GPU runtime speed

– Slow algorithms (< 3fps) will not be included in the leaderboard

HTTP://BENCHMARK.TUSIMPIE.AI

Available now!!

Xiaodi Hou

Documents

Autonomous Driving on Benchmarkson-demand.gputechconf.com/gtc/2017/presentation/s7787...• Stixels –Representing the world using matchstick –Distance and 3D shape –Missing the