Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Autonomous Driving on Benchmarks
Xiaodi Hou
TWO DECADES OF BENCHMARKING
Two decades of benchmarking
• MNIST
– 1998
– Character recognition
– 60,000 images
• Inspired Convolutional Neural Net
Two decades of benchmarking
• PASCAL-VOC
– 2005
– Object detection & classification
– 3787 images
• Inspired Deformable Part-based Model
Two decades of benchmarking
• ImageNet
– 2010
– Object classification
– 1,000,000 images
• Inspired deep learning
LIMITATIONS OF BENCHMARKS
Upper bounds of benchmarks
Objective tasks
Intermediate tasks
Subjective tasks
• Measuring physical reality
• Bounded by measurement
accuracy
• Stereo/Optical flow/Face
recognition
• Measuring human cognition
• Bounded by subject
agreement
• Saliency/Memorability/Image
captioning
Imperfect benchmarks
• Marriage market in China
– Tall, rich, and handsome• 80% girls are forced to choose among
– tall poor ugly guy
– short rich ugly guy
– short poor handsome guy
• Dimensionality reduction
– Guaranteed information loss!
– A projection of 𝑹𝒏 → 𝑹
• Red or Blue?
Signs of a fading benchmark
• Saturated competition– Labeled Face in the Wild (0.9978 ± 0.0007)
• Weak transferability– Middlebury Optical Flow → KITTI Optical Flow
• Poor inert-subject consistency– Image captioning and BLEU scores
• A man throwing a frisbee in a park.
• A man holding a frisbee in his hand.
• A man standing in the grass with a frisbee.
BENCHMARKS AND AUTONOMOUS DRIVING
Vision-based autonomous driving benchmarks
• KITTI & CityScapes
– Detection
– Tracking
– Stereo/Flow
– SLAM
– Semantic segmentation
• 100% traditional vision challenges
• Are we ready?
Not yet…
Challenge 1: Data distribution
• Academia
– Average performance
• Silicon valley startup
– Demo oriented
– Best case performance
• Real products
– Murphy’s law
– Worst case performance
Challenge 2: Gruond-truth representation
• Bbox– Almost no bbox in real
world!
– Missing hidden variables (distance & velocity)
• Semantic segmentation– “pixel classification”
– How to assemble all the pixels?
• Stixels– Representing the world
using matchstick
– Distance and 3D shape
– Missing the notion of whole objects
Challenge 3: Structured prior
• What’s wrong with end-to-end learning?
Challenge 3: Structured prior
• Two types of priors:– Implicit prior
• Data driven (e.g. images)
• Good for deep learning models
– Explicit prior• Rule driven (e.g. cars cannot fly)
• Good for probabilistic models
• The road ahead– An image based problem with strong explicit priors
TUSIMPLE CHALLENGES!WORKSHOP@CVPR 2017
TuSimple Challenge 1: Lane challenge
TuSimple Challenge 1: Lane challenge
• Deep learning for lane?– Parametrization of pixels
• Strong structure priors– ~ 3.75m lane width
– Parallel lines
– (almost) flat road surface
• Over-representing corner cases– 20% hard cases (heavy occlusion/strong light condition
change/bad markings) are unlikely to occurs, if sampled uniformly
TuSimple Challenge 2: Velocity estimation
• Representing the world with cam + LiDAR
TuSimple Challenge 2: Velocity estimation
• Object-level representation for motion planning
– Stereo map?
– SLAM?
– Estimation based on bbox size?
• LiDAR vs Camera
– No LiDAR solution for 200m perception
TuSimple challenges
• Video clip based
– We expect non-trivial temporal aggregation!
• Confidence based
– Each entry has a “confidence” field
– We evaluate the most confident 80% entries
• Run-time
– Must report single GPU runtime speed
– Slow algorithms (< 3fps) will not be included in the leaderboard
HTTP://BENCHMARK.TUSIMPIE.AI
Available now!!
Xiaodi Hou