1
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis Yiyi Liao 1,2,* Katja Schwarz 1,2,* Lars Mescheder 1,2,3, Andreas Geiger 1,2 1 Max Planck Institute for Intelligent Systems, Tübingen 2 University of Tübingen 3 Amazon, Tübingen Motivation Task: 3D Controllable Image Synthesis 3D controllability is essential in many applications, e.g., gaming, simulation, virtual reality and data augmentation 3D controllable properties: 3D pose, shape, appearance of multiple objects and camera viewpoint Is it possible to learn the simulation pipeline including 3D content cre- ation from raw 2D image observations? x z 3D controllable image synthesis on Real-world Fruit Dataset Training images Generative Models Render 3D Design Classical Rendering Pipeline 3D controllable Expensive and inefficient to design 3D models 2D Generator Sample 2D Generative Model Efficient and can be learned from only 2D images Not 3D controllable 3D Generator Render 2D Generator Sample Our Approach Efficient and can be learned from only 2D images 3D controllable Idea: Learning the image generation process jointly in 3D and 2D space Method 3D Generator Differentiable Projection 2D Generator Render Render Render Render Alpha Composition Real Background Image Background Image Real 3D Representations: Foreground objects o i : o i =(s i , R i , t i , ϕ i ) ϕ i : Appearance feature Primitive type: Point clouds, cuboids, spheres Scene background o bg : Spherical environment map Loss Functions: Adversarial Loss: L adv (θ,ψ,c)= E p(z) [f (d ψ (g θ (z,c),c))] + E p D (I|c) [f (-d ψ (I,c))] Compactness Loss: L com (θ )= E p(z) [ N i=1 max ( τ, A i 1 H ×W )] Geometric Consistency Loss: L geo (θ )= E p(z) [ N i=1 A i (X i - ˜ X i )1 ] +E p(z) [ N i=1 A i (D i - ˜ D i )1 ] Quantitative Results Ablation Study on Different 3D Representations FID FID t FID R FID i MVC 1 X ˆ A ˆ I Vanilla GAN 50 41 Point cloud 38 43 44 66 Good Cuboid 38 45 45 60 Good Sphere 33 45 45 53 Good Deformable primitive w/o g 2D θ 69 71 74 69 Good Single primitive 30 38 44 Pool 1 Multi-view consistency Comparison to Baselines Car Indoor FID FID t FID R FID FID t FID R Vanilla GAN 43 89 Layout2Im 43 56 84 93 2D Baseline 80 79 107 102 Ours (w/o c) 65 71 75 120 120 120 Ours 44 54 66 88 90 100 Qualitative Results Car Dataset L.2Im (t) 2D (t) Ours (t) Ours (R) Indoor Dataset L.2Im (t) 2D (t) Ours (t) Ours (R) Failure Cases x https://github.com/autonomousvision/controllable_image_synthesis {firstname.lastname}@tue.mpg.de

Towards Unsupervised Learning of Generative Models for 3D ... · H W)] Geometric Consistency Loss: Lgeo( ) = Ep(z) [∑ N i=1 ∥A ′ i⊙(X′ i X~′ i)∥1] +Ep(z) [∑ N i=1

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards Unsupervised Learning of Generative Models for 3D ... · H W)] Geometric Consistency Loss: Lgeo( ) = Ep(z) [∑ N i=1 ∥A ′ i⊙(X′ i X~′ i)∥1] +Ep(z) [∑ N i=1

Towards Unsupervised Learning of Generative Models for 3D Controllable Image SynthesisYiyi Liao1,2,∗ Katja Schwarz1,2,∗ Lars Mescheder1,2,3,† Andreas Geiger1,2

1Max Planck Institute for Intelligent Systems, Tübingen2University of Tübingen 3Amazon, Tübingen

Motivation

Task: 3D Controllable Image Synthesis• 3D controllability is essential in many applications, e.g., gaming,

simulation, virtual reality and data augmentation• 3D controllable properties: 3D pose, shape, appearance of multiple

objects and camera viewpoint• Is it possible to learn the simulation pipeline including 3D content cre-

ation from raw 2D image observations?

x

z

3D controllable image synthesis on Real-world Fruit Dataset Training images

Generative Models

Render3D Design

Classical Rendering Pipeline

3D controllableExpensive and inefficient to design 3D models

2D Generator

Sample

2D Generative Model

Efficient and can be learned from only 2D imagesNot 3D controllable

3D Generator

Render2D

GeneratorSample

Our Approach

Efficient and can be learned from only 2D images3D controllable

Idea: Learning the image generation process jointly in 3D and 2D space

Method

3D Generator Differentiable Projection 2D Generator

Render

Render

Render

Render

Alpha

Composition

Real

Background

Image

BackgroundImage

Real

3D Representations:

Foreground objects oi:• oi = (si,Ri, ti,ϕi)

• ϕi: Appearance feature• Primitive type: Point clouds,

cuboids, spheres

Scene background obg:• Spherical environment map

Loss Functions:

• Adversarial Loss:

Ladv(θ, ψ, c) = Ep(z)[f (dψ(gθ(z, c), c))] + EpD(I|c)[f (−dψ(I, c))]

• Compactness Loss:

Lcom(θ) = Ep(z)[∑N

i=1 max(τ,

∥Ai∥1H×W

)]• Geometric Consistency Loss:

Lgeo(θ) = Ep(z)[∑N

i=1 ∥A′i⊙ (X′

i− X′i)∥1

]+Ep(z)

[∑Ni=1 ∥A′

i⊙ (D′i− D′

i)∥1]

Quantitative Results

Ablation Study on Different 3D RepresentationsFID FIDt FIDR FIDi MVC1 X A I

Vanilla GAN 50 – – 41

Point cloud 38 43 44 66 Good

Cuboid 38 45 45 60 Good

Sphere 33 45 45 53 Good

Deformable primitive w/o g2Dθ 69 71 74 69 Good

Single primitive 30 38 44 – Pool

1Multi-view consistency

Comparison to BaselinesCar Indoor

FID FIDt FIDR FID FIDt FIDR

Vanilla GAN 43 – – 89 – –

Layout2Im 43 56 – 84 93 –

2D Baseline 80 79 – 107 102 –

Ours (w/o c) 65 71 75 120 120 120

Ours 44 54 66 88 90 100

Qualitative Results

Car Dataset

L.2Im

(t)

2D(t

)Ou

rs(t

)Ou

rs(R

)

Indoor Dataset

L.2Im

(t)

2D(t

)Ou

rs(t

)Ou

rs(R

)

Failure Cases

x

https://github.com/autonomousvision/controllable_image_synthesis {firstname.lastname}@tue.mpg.de