43
TorontoCity: Seeing the World with a Million Eyes

TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

TorontoCity: Seeing the World

with a Million Eyes

Page 2: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Authors

Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang

Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun

* Project Completed by Summer 2016

Page 3: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Why Toronto? The best place to live in the world*

• Toronto 4

*According to 2015 Global Liveability Ranking

Page 4: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Why Toronto? The best place to live in the world*

• Toronto 4

The places you are working at:

• Boston 36

• Pittsburgh 39

• San Francisco 49

• Los Angeles 51

*According to 2015 Global Liveability Ranking

Page 5: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

A dataset over 700 km2 region!

Page 6: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

From all the views!

Page 7: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial

Data Source

Page 8: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial

Ground Level Panorama

Data Source

Page 9: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial LIDAR

Data Source

Ground Level Panorama

Page 10: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial LIDAR Stereo

Data Source

Ground Level Panorama

Page 11: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial LIDAR Stereo

Drone

Data Source

Ground Level Panorama

Page 12: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Dataset

Aerial Airborne LIDAR

Data Source

Ground Level Panorama

LIDAR Stereo

Drone

Page 13: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Why we need this?

• Mapping for Autonomous Driving

• Smart City

• Benchmarking:• Large-Scale Machine Learning / Deep Learning

• 3D Vision

• Remote Sensing

• Robotics

Source: Here 360

Page 14: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Why we need this?

• Mapping for Autonomous Driving

• Smart City

• Benchmarking:• Large-Scale Machine Learning / Deep Learning

• 3D Vision

• Remote Sensing

• Robotics

Source: Toronto SmartCity Summit

Page 15: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Why we need this?

• Mapping for Autonomous Driving

• Smart City

• Benchmarking:• Large-Scale Machine Learning / Deep Learning

• 3D Vision

• Remote Sensing

• Robotics

Page 16: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Annotations

• Manual annotation? Impossible!• Suppose each 500x500 image costs $1 to annotate pixel-wise

labels, we need to pay $11M to create ground-truth only for the aerial images.

Page 17: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Annotations

• Manual annotation? Impossible!• Suppose each 500x500 image costs $1 to annotate pixel-wise

labels, we need to pay $11M to create ground-truth only for the aerial images.

I’m not as

rich as

Jensen

Page 18: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Annotations

• Manual annotation? Impossible!• Suppose each 500x500 image costs $1 to annotate pixel-wise

labels, we need to pay $11M to create ground-truth only for the aerial images.

• However, humans already collect rich knowledge about the world!

I’m not as

rich as

Jensen

Page 19: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Annotations

• Manual annotation? Impossible!• Suppose each 500x500 image costs $1 to annotate pixel-wise

labels, we need to pay $1139200 to create ground-truth only for the aerial images.

• Humans already collect rich knowledge about the world!

Use maps!

I’m not as

rich as

Jensen

Page 20: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Map as Annotations

HD Map

Maps

Page 21: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Map as Annotations

3D BuildingHD Map

Maps

Page 22: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Map as Annotations

3D BuildingHD Map Meta Data

Maps

Page 23: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Together, the rich sources of data enable a plethora of exciting tasks!

Page 24: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Building Footprint Extraction

Page 25: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Road Curb and Centerline Extraction

Page 26: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Building Instance Segmentation

Page 27: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Zoning Prediction

InstitutionalResidential

Commercial

Page 28: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Technical DifficultiesMis-alignment and Data Noise

Aerial-ground images mis-alignment from raw GPS location data

Road centerline is shifted

Building’s shape/location is not accurate

Page 29: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Data Pre-processing and AlignmentAppearance based Ground-aerial Alignment

Before Alignment

After Alignment

Page 30: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Data Pre-processing and AlignmentInstance-wise Aerial-map Alignment

Before alignment

Page 31: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Data Pre-processing and AlignmentInstance-wise Aerial-map Alignment

After alignment

Page 32: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Data Pre-processing and AlignmentRobust Road Surface Generation

Input Road Curb and Centreline (Noisy)

Polygonized Road Surface

Page 33: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksBuilding Contour and Road Curb/Centerline Extraction

GT ResNet

Page 34: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksSemantic Segmentation

Method Road Building Mean

FCN 74.94% 73.88% 74.41%

ResNet-56 82.72% 78.80% 80.76%

Metric: Intersection-over-union (IOU), higher is better

Page 35: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksBuilding Instance Segmentation

Input DWT

Page 36: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksBuilding Instance Segmentation

Metric: Weighted Coverage, AP, Precision-50%, Recall-50%, higher is better

MethodWeighted

Coverage

Average

PrecisionRecall-50% Precision-50%

FCN 41.92% 11.37% 21.50% 36.00%

ResNet-56 40.65% 12.13% 18.90% 45.36%

Deep

Watershed

Transform

56.22% 21.22% 67.16% 63.67%

Page 37: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksBuilding Instance Segmentation

Join the other talk today to know more about the deep watershed instance segmentation: Wednesday, May 10, 4:00 PM - 4:25 PM – Room 210G

MethodWeighted

Coverage

Average

PrecisionRecall-50% Precision-50%

FCN 41.92% 11.37% 21.50% 36.00%

ResNet-56 40.65% 12.13% 18.90% 45.36%

Deep

Watershed

Transform

56.22% 21.22% 67.16% 63.67%

Page 38: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksGround-view Road Segmentation

True Positive: Yellow; False Negative: Green; False Positive: Red

Page 39: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksGround-view Road Segmentation

Metric: Intersection-over-Union, higher is better

Method Non-Road IOU Road IOU Mean IOU

FCN 97.3% 95.8% 96.5%

ResNet-56 97.8% 96.6% 97.2%

Page 40: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Pilot Study with Neural NetworksGround-view Zoning Classification

Top-1 Accuracy

Method From-ScratchPre-trained from

ImageNet

AlexNet 66.48% 75.49

GoogLeNet 75.08% 77.95%

ResNet 75.65% 79.33%

Metric: Top-1 Accuracy, higher is better

Page 41: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

• # of buildings: 397846

• Total area: 712.5 km2

• Total length of road: 8439 km

Statistics

Page 42: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Building height distribution Zoning type distribution

Statistics

Page 43: TorontoCity: Seeing the World with a Million Eyes€¦ · Ground Level Panorama. Dataset Aerial Airborne LIDAR Data Source Ground Level Panorama LIDAR Stereo Drone. Why we need this?

Conclusion

• We propose a large dataset with from different views and sensors

• Maps are used to create GT annotations

• In future we have many more exciting tasks to come

• Check our paper for more details: https://arxiv.org/abs/1612.00423

• Data available soon. Stay tuned and welcome to over-fit

Join the other talk today to know more about the deep watershed instance segmentation:

Wednesday, May 10, 4:00 PM - 4:25 PM – Room 210G