Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
2016年9月13日GTC大会
清华大学电子系 马惠敏
智能驾驶与视觉感知
http://3dimage.ee.tsinghua.edu.cn [email protected] Email: Tel: 62781432
Google Mobileye
Tesla NVIDIA
智能驾驶与视觉感知 Visual Perception for Semi-Autonomous Driving
Google Mobileye
Tesla
NVIDIA
• LIDAR
• Radar/ Sonar
• Stereo Camera
• Maps
• Mobileye EyeQ3
• Forward Radar
• Forward Camera
• 360°Ultrasonic Sonar
• GPS
• EyeQ3 (Now)
• EyeQ4 (2018)
8 cameras
• Drive PX:2 Jetson TX1, 12 cameras
• Drive PX2:12 CPU cores, 2 Pascal GPUs
DriveNet
智能驾驶与视觉感知 Visual Perception for Semi-Autonomous Driving
技术/Techniques
• 车道检测 Line Detection
• 交通标识/灯检测 Traffic Sign/Light Detection
• 驾驶道路标识 Drivable Space Labeling
• 通用物体检测 General Object Detection
• 整体路径规划 Holistic Path Planning
功能/Functions
• 前向碰撞预警 Forward Collision Warning
• 自适应巡航控制 Adaptive Cruise Control
• 自动变道 Automatic Lane Changing
• 交通拥堵辅助 Traffic Jam Assistance
• 自动泊车 Autonomous parking
智能驾驶与视觉感知 Visual Perception for Semi-Autonomous Driving
计算力
大数据
DGX-1 Drive PX2
深度模型训练:Nvidia Tesla GPUs 嵌入式平台:Nvidia Tegra GPUs
智能驾驶的驱动力
Multiple Sensors in Cars
核心问题:小目标、强遮挡、高动态 Small objects, Heavy occlusion, Rapid moving
智能驾驶与视觉感知 Visual Perception for Semi-Autonomous Driving
智能驾驶与视觉感知
ImageNet: • 1000 classes, ~2M images
PASCAL VOC: • 20 classes, ~20K images
Professional photographs, simple scenes, large objects
KITTI: • 3 classes, ~20K images
COCO: • 80 classes, ~200K images
smaller objects, heavy occlusion, high resolution
recall> 95% (1K Prop.)
recall < 75% recall < 60% MSRA 152层 AP ~40%
物体检测
8
显著性物体检测:语义注意认知模型
部件与结构认知模型:抵抗遮挡能力
3D 场景物体识别:适应复杂环境
图像认知心理学:心理特征量化提取
让机器学习人的思考模式 认知
心理 识别
智慧决策
PRL : Semantic parts based top-down pyramid for action recognition ,2016 PRL : Geodesic weighted bayesian model for saliency optimization,2016 PR : Manifold topological multi-resolution analysis method, 2011 CVPR: Monocular 3D object detection for autonomous driving, 2016 CVPR: Improving object proposals with multi-thresholding straddling expansion, 2015 NIPS : 3D object proposals for accurate object class detection, 2015
代表性文章与专利
专利 :201310221563.9 一种图像认知心理分析系统 专利 :201310222323.0 基于MMPI心理量表的图像库及其构建方法
http://3dimage.ee.tsinghua.edu.cn
智能
显著性物体检测
关键部件检测
三维目标识别
中层: discriminative parts
低层: parts
高层: structure
智能驾驶与视觉感知
(一)显著性物体检测
提出自顶向下的认知模型
利用贝叶斯框架抑制噪声
利用区域空间关系统一标出整个显著目标
Ours
RC
Ground Truth
Saliency map
Bayesian inference
0 2 4
x 104
0
0.05
0.1
0 2 4
x 104
0
0.05
0.1
0 2 4
x 104
0
0.1
0.2
0 2 4
x 104
0
0.5
1
0 2 4
x 104
0
0.5
1
0 2 4
x 104
0
0.5
1
0 2 4
x 104
0
0.5
1
0 2 4
x 104
0
0.5
1
0 2 4
x 104
0
0.5
Geodesic Similarity
ν:color, texture
Initial salient regions
original saliency map
Dense CRF
X. Wang, H. Ma, X. Chen. ICIP’15.
挑战 如何抑制噪声?
如何统一标出有语义的整个物体?
引入自顶向下认知模型
怎样抑制噪声?
Bayesian framework (Xie et al. TIP’13)
怎样统一标出整个目标?
分析两个区域是一个物体的两个部分的可能性
颜色、纹理等特征的相似度和公共边界空间关系
测地距离的引入Geodesic Weighted Bayesian Model (GWB)
Positive samples
Negative samples
(一)显著性物体检测
RC(CVPR’11) GS(ECCV’12) HDCT(CVPR’14)
Improved
Improved
Improved
Salie
ncy
map
s
(一)显著性物体检测
(一)显著性物体检测
局部 整体
显著 弱化
特征 语义
物体特征
环境关系
结构映射
Deep
ConvNet
RoI
Projection
RoI
Pooling
LayerFCs
RoI
Feature
For each RoI
FC
Softmax
(1)显著性物体检测
Ours Ground Truth
Saliency map
• 利用边缘检测将图像分割成区域,用mask-based Fast R-CNN提取每个区域的特征并预测其显著性值,得到整个图像的saliency map,并保持原有的边缘信息
• 在ROI-pooling层加上区域轮廓的mask, 提取每个区域的高层语义
• 但还缺乏上下文语义 (Contextual)
X. Wang, H. Ma, X. Chen. ICIP’16.
(1)显著性物体检测
Ours
RC
物体的显著性很大程度上决定于其所在的场景:
• 以前工作多采用扩大框的大小,提取上下文信息
• 提出多尺度空间上下文,融合不同层次的语义
神经网络不同层的特征可视化
Xiang Wang, Huimin Ma, Shaodi You, Xiaozhi Chen, “Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection”, arXiv preprint
arXiv:1608.08029, https://arxiv.org/abs/1608.08029
(1)显著性物体检测
Ours
RC
Xiang Wang, Huimin Ma, Shaodi You, Xiaozhi Chen, “Edge
Preserving and Multi-Scale Contextual Neural Network for
Salient Object Detection”, arXiv preprint arXiv:1608.08029,
https://arxiv.org/abs/1608.08029
多尺度空间上下文的融合
• 同时实现显著性物体的精确定位和边缘保持
(1)显著性物体检测
Ours
RC
Ours
RC
(1)显著性物体检测
显著性物体检测
部件检测识别
3D 物体检测识别
物体所在区域
智能驾驶与视觉感知
在人的物体图像认知中什么是最重要的因素?
部件
环境 结构
color
texture corner
shape edge
part pose
category structure
action location
scene
Low level
High level
Image
Semantics
Mid level
(二)部件检测识别
中层: discriminative parts
低层: 边、角和区域
高层: structure
怎样在复杂环境中识别目标
显著性物体检测
部件检测
3D 物体检测
(二)部件检测识别
为什么要用中层可鉴别部件特征?
• 高层特征: 语义结构, 可变形物体不具备结构不变性
• 中层特征: bag-of-parts ,可鉴别部件相对稳定,有一定语义
• 低层特征: bag-of-words,具有通用性,缺乏语义
用中层特征聚类 用低层特征聚类
Singh et al. ECCV’12
(二)部件检测识别
vs
vs
vs
关键部件
中层:可鉴别部件
Low-level: words
High-level: structure
(二)部件检测识别
CNN学习
认知模型
Human-level concept learning through probabilistic program induction Science 350(6266):1332-1339, 2015
Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum, Department of Brain and Cognitive Sciences, MIT
Single object → Classification for new → Generating new examplar → Parts and relation → New concept
(2)部件与结构认知模型
提出通用对称对结构模型
建立了动作图象中的对称对结构模型
降低了结构对部件准确检测的要求
T1
T2
c
提出部件的Top-down空间结构模型
确定部件的空间关系提高动作检测准确性
建立空间结构神经网络和分级分辨神经网络
用全卷积网络确定并定位部件
对称对结构模型
动作部件空间关系
Human-level concept learning
2015.12.12 Science
显著性物体检测
部件检测识别
3D 物体检测识别
智能驾驶与视觉感知 物体所在区域
KITTI (Geiger et al., CVPR’12)
Categories: Car, Pedestrian, Cyclist
Images: 2 views, 7,481 images for training, 7,518 images for testing
Annotations: 2D/3D bounding boxes, occlusion/truncation labels
引入3D和环境先验知识
KITTI: evaluation protocols Tasks:
object detection
object detection and orientation estimation
Overlap criteria: 0.7 for Car, 0.5 for Pedestrian/Cyclist
Difficulties: easy/moderate/hard
自动驾驶中的目标检测
Easy Moderate Hard
• Clutter scene • Heavy occlusion • Small objects
KITTI: evaluation protocols Tasks:
object detection
object detection and orientation estimation
Overlap criteria: 0.7 for Car, 0.5 for Pedestrian/Cyclist
Difficulties: easy/moderate/hard Easy Moderate Hard
• Clutter scene • Heavy occlusion • Small objects
自动驾驶中的目标检测
3D object proposals: Think in 3D as human!
Basic idea 利用三维信息: 立体视觉 三维空间中的物体语义模型
• point density, free space, size prior, road plane, etc.
Yamaguchi et al., ECCV’14
Left image
Right image
Disparity
occupancy
stereo
depth features visible free space
Prior
Proposals generation as energy minimization
point cloud density
Proposals represented by 3D cuboid:
• : 3D coordinates
• : azimuth angle
• : class, : template
( , , , ), ,x y z c ty
, ,x y z
c t
3D contextual model:
free space height prior
height contrast [Chen & Ma, et al. NIPS’15]
CNN (VGG)
Joint box regression and orientation estimation
Project to 2D
3D object proposals
三维环境模型:
高度先验
高度对比度
点云密度
自由空间
3D Proposal Free Space Height prior
立体/Stereo • 3D Sampling • Road Estimation from 3D • Point Cloud Features • Exhaustive Search • Structured SVM
单目/Monocular • 3D Sampling • Road Estimation from 2D • Semantic Features • Exhaustive Search • Structured SVM
Scoring & NMS
Stereo images 3D proposals
CNN Scoring
3D Proposal Generation 3D Object Detection Network
[Chen & Ma, et al. NIPS’15 & CVPR’16]
Single-Stream Network
Co
ncaten
ation
Softmax classification
Box regression
Orientation regression
Box proposal
Context region
ROI pooling
Conv layers
ROI pooling
FCs
FCs
FC
FC
FC
Two-Stream Network
Input: • RGB
• RGB, HHA
3D Object Detection Network 包含上下文信息:Incorporating context information
多任务监督:Multi-task supervision
多通道特征学习:Multi-stream feature learning
[Chen & Ma, et al. NIPS’15 & CVPR’16]
• RGB-HHA Input:
小目标
.
强遮挡
.
高动态
智能驾驶与视觉感知
Results: Object Detection and Orientation Estimation Object detection (AP平均精度)
http://www.cvlibs.net/datasets/kitti/eval_object.php
Object detection and orientation estimation (AOS 平均角度相似性)
Pedestrian
Car
Car
Pedestrian
Cyclist
Cyclist
*2016年3月
Car Object detection (AP) Object detection and orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
SubCat [1] 84.14 75.46 59.71 74.42 83.41 58.83
3DVP [2] 87.46 75.77 65.38 86.92 74.59 64.11
AOG [3] 84.80 75.94 60.70 - - -
Regionlets [4] 84.75 76.45 59.70 - - -
spLBP [5] 87.19 77.40 60.60 - - -
Faster R-CNN [6] 86.71 81.84 71.12 - - -
3DOP (Ours) 93.04 88.64 79.10 91.44 86.10 76.52
Mono3D (Ours) 92.33 88.66 78.96 91.01 86.62 76.84
Pedestrian Object detection (AP) Object detection and orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
DPM-VOC+VP [1] 59.48 44.86 40.37 53.55 39.83 35.73
FilteredICF [2] 67.65 56.75 51.12 - - -
DeepParts [3] 70.49 58.67 52.78 - - -
CompACT-Deep [4] 70.69 58.74 52.71 - - -
Regionlets [5] 73.14 61.15 55.21 - - -
Faster R-CNN [6] 78.86 65.90 61.18 - - -
Mono3D (Ours) 80.35 66.68 63.44 71.15 58.15 54.94
3DOP (Ours) 81.78 67.47 64.70 72.94 59.80 57.03
Cyclist Object detection (AP) Object detection and orientation estimation (AOS)
Method Easy Moderate Hard Easy Moderate Hard
DPM-VOC+VP [1] 42.43 31.08 28.23 30.52 23.17 21.58
Mono3D (Ours) 76.04 66.36 58.87 65.56 54.97 48.77
3DOP (Ours) 78.39 68.94 61.37 70.13 58.68 52.35
Stereo vs LIDAR
3D Object Detection on KITTI (Car)
0
20
40
60
80
100
RGB RGB-HHA, 1-stream RGB-HHA, 2-stream
Stereo LIDAR Hybrid
3D AP
路面用Stereo得到的稠密点云 物体用Lidar得到的稀疏精确点云 H: 点的深度 H: 点到路面的高度 A: 点的角度
Stereo vs LIDAR
3D Object Detection on KITTI (Car)
[Chen & Ma, et al. PAMI’16]
•M
ult
i-M
od
al
多模态 & 多任务 & 多视角 Multi-Modal & Multi-Task & Multi-View
•M
ult
i-Ta
sk
Multi-Modal & Multi-Task & Multi-View
多模态 & 多任务 & 多视角
Front view: LIDAR/RGB
Bird’s eye view
• Input representation
• Sparse points
• Extra small objects
• Oriented bounding boxes
多模型&多任务&多视角
Results: Sensor Fusion
显著性物体检测
部件检测识别
3D 物体检测识别
智能驾驶与视觉感知 物体所在区域
ICIP 2014: Learning a compact latent reprecentation of the bag-of –parts model
ICCVG 2014: Protected Pooling Method Of Sparse Coding In Visual Classication
ICIP 2015: Geodesic weighted Bayesian model for salient Object Detection.
CVPR 2015: Improving Object Proposals with Multi-Thresholding Straddling Expansion
NIPS 2015: 3D Object Proposals for Accurate Object Class Detection
CVPR 2016: Monocular 3D Object Detection for Autonomous Driving
ICIP 2016: Object Detection Via Fast R-CNN And Low-level Cues ICIP 2016: Region Candidate Combination for Action Recognition. PRL 2016: Geodesic Weighted Bayesian Model for Saliency Optimization
PRL 2016: Semantic parts based top-down pyramid for action recognition
color
texture corner
shape edge
part pose
category structure
action location
scene
低层
高层
中层
Image
Semantics
发表的相关文章: http://3dimage.ee.Tsinghua.edu.cn
智能+
神经网群
认知模型
统计学习
智能驾驶与视觉感知
显著性物体检测
部件检测识别
3D 物体检测识别
Thank You
电子工程系 Electronic Engineering
http://3dimage.ee.Tsinghua.edu.cn