Neural Techniques for Pose Guided Image Generation · Neural Techniques for Pose Guided Image...

Neural Techniques for Pose Guided Image GenerationAyush Gupta (ayushg@stanford.edu), Shreyash Pandey (shreyash@stanford.edu), Vivekkumar Patel (vivek14@stanford.edu)

Introduction•To sythesize person images based on acondition image and desired pose.

•Extend the PG2 model [3] for betterperformance and speedier training.

• Incorporate WGANs for robust and morestable training.

•Use a triplet classification based discriminatorarchitecture for model simplification and speed.

Data Preparation

Dataset•The DeepFashion (In-shop Clothes RetrievalBenchmark) dataset is used.

• It consists of 52,712 number of in-shop clothesimages and around 200,000 cross-pose/scale pairs.

•Each image has a resolution of 256 x 256.Processing•Filtered for non-single and complete images. Splitinto 32,031 training and 7,996 test images.

•Training set augmented to 57,520 images afterleft-right flip and filtering non-different images.

•Gives 127,022 training ex.; 18,568 test ex.•A state-of-the-art pose estimator [2] gives 18 posekeypoints of images. Stored as pickle files.

•The keypoints were also used to generate ahuman body mask by morphologicaltransformations for Stage-1 training

•Thus, each example stored in HDF5 contained acondition image, target image, 18 channel targetpose info, and a single channel pose mask.

•Each image pair was normalized by a mean of127.5, and std. of 127.5.

•Poses normalized to be between -1 and +1.

Original Architecture

•Stage-1 has a U-net like architecture. Skipconnections help in transferring appearanceinformation to the output.

•Stage-2 is a conditional DCGAN that generates adifference map.

•Stage-2 Discriminator differentiates between pairsof images. Real pair has condition and targetimage. Fake pair has condition and generatedimage.

Pose Mask Loss

•We calculate the loss for stage-1 as follows:L = ||(IB − IB) ∗ (1 + MB)||1

•Masking by pose forces the model to ignore thebackground variations.

Our Modifications

•Discriminator is now similar to a classifier. Targetimage has label 1, generated image and conditionimage have label 0.

•Using WGAN training to train the conditionalDCGAN in stage-2.

•With WGAN, Discriminator is trained for N (=10) iterations for every iteration on Generator.

Implementation Details

•With a batch-size of 4, Stage-1 is trained for 40ksteps followed by 25k steps for Stage-2.

•Adam Optimizer is used for Stage-1 and Stage-2,except for WGAN in which we use RMSProp forStage-2.

•Learning rates for each of these stages are set to5e-5.

SSIM and IS scores

No. Model IS SSIM1 G1 2.58 0.722 G1 + G2 + D 2.993 0.713 Triplet 3.01 0.7274 WGAN-triplet 3.02 0.735 Ma et. al. 3.091 0.766 Target 3.25 1

Table 1: Ma et al. was the original paper. Target denotes thetarget test set.

Generation Results

•Shown above are results of WGAN-triplet modelin the following order: from left to right,condition image, target pose, Stage-1 result,Stage-2 result and target image.

•Stage-1 combines the target pose and appearance.•Stage-2 generates a sharper image after using thecoarse result.

Failures

Sharpness visualized via Gradients

•Stage-2 helps in generating sharper images.

Conclusion and Future Work

•Two-stage architecture is able to transfer posewhile retaining appearance.

• Incorporating triplet classification and WGANleads to faster convergence and more robusttraining.

•Failure modes include target poses that involvecloseups of people, and attires such as jacketsthat look different from different angles. Moredata could potentially solve these problems.

•One extension could be to extend this model to asemi-supervised setting where pose is supervisedand appearance is unsupervised. This will allowarbitrary manipulation of poses [1].

References[1] Rodrigo de Bem et al. “DGPose: Disentangled Semi-supervised Deep

Generative Models for Human Body Analysis”. In: arXiv preprintarXiv:1804.06364 (2018).

[2] Zhe Cao et al. “Realtime multi-person 2d pose estimation using partaffinity fields”. In: CVPR. Vol. 1. 2. 2017, p. 7.

[3] Liqian Ma et al. “Pose guided person image generation”. In: Advancesin Neural Information Processing Systems. 2017, pp. 405–415.

Neural Techniques for Pose Guided Image Generation · Neural Techniques for Pose Guided Image...

Documents

Instagram Hashtag Prediction, With and Without datacs229.stanford.edu/proj2017/final-posters/5140609.pdf · Instagram Hashtag Prediction, With and Without data Shreyash Pandey (shreyash@stanford.edu),

A Stereoscopic Look into the Bulk · 2016. 9. 29. · E-mail: czech@stanford.edu, llamprou@stanford.edu, smccandlish@stanford.edu, bmosk1@stanford.edu, jsully@slac.stanford.edu Abstract:

Calorie Estimation From Fast Food Imagescs229.stanford.edu/proj2015/151_report.pdf · Calorie Estimation From Fast Food Images Kaixi Ruan kaixi@stanford.edu Lin Shao lins2@stanford.edu

Homework 5 - ee104.stanford.edu

Stanford Universitysnap.stanford.edu/class/cs224w-2016/projects/cs224w-32-final.pdfzliu2@stanford.edu abrack74@stanford.edu y, Yixin Cal yixincai@stanford.edu l. Introduction Bitcoln

miRNA Regulatory Networks - biochem218.stanford.edu

CS229 PROJECT REPORT - Machine learningcs229.stanford.edu/proj2017/final-reports/5237896.pdf · CS229 PROJECT REPORT Predicting Instagram tags with and without data Shreyash Pandey

Matrix Completion from Noisy Entries · 2021. 1. 12. · Raghunandan H. Keshavan RAGHURAM@STANFORD.EDU Andrea Montanari∗ MONTANARI@STANFORD.EDU Sewoong Oh SWOH@STANFORD.EDU Department

john-russell.orgjohn-russell.org/Texts/7op.pdf · ocean pose. ocean pose. cxean pose. ocean pose _ ocean pose. ocean pose. ocean pose. ocean pose. ocean pose starting in the south

cs230.stanford.educs230.stanford.edu/projects_winter_2019/reports/15813424.pdf · [1] Alexander Toshev and Christian Szegedy. Deeppose: Human pose estimation via deep neural networks

CS230 Deep Learningcs230.stanford.edu/projects_spring_2018/posters/8284325.pdf · 2018. 9. 28. · Aditya Chander ( ac888@stanford.edu ), Marina Cottrell ( mborsodi@stanford.edu ),

Neural Networks for Video Frame Interpolationcs229.stanford.edu/proj2016/poster/KorenMendaSharma... · Apoorva Sharma apoorva@stanford.edu Kunal Menda kmenda@stanford.edu Mark Koren

Lasso Regression - cs229.stanford.edu

LAAB Architecure - Abibois · ITE S/O Adninistraticxl pose SIO pose façade WE Aifrninistration pose SIO IT E SIE Pose WO Pose façade SLE Pose façade NIE SIO 24 térré 25 jours

Dispersive charge density wave excitations in … · Dispersive charge density wave excitations in Bi 2Sr ... Correspondence to: leews@stanford.edu, zxshen@stanford.edu, tpd@stanford.edu

boosting - cs229.stanford.edu

Padmasana – Lotus Pose Sukhasana – Easy Pose · Extended Hand Big Toe Pose Natarajasana – Dancer’s Pose ! ! Parivrtta trikonasana- Twisting Triangle Pose Eka pada raja kapotasana

Pose Observation for Second Order Pose Kinematics

Cat Pose Knee to Chest Tree Pose See Saw Pose Cow Pose...Knee to Chest Cow Pose Cat Pose Title MovetoLearn_Yoga_cards_mockup Created Date 8/4/2020 1:31:53 PM

EE365: Epidemic Example - stanford.edu