7
Making Robots Draw A Vivid Portrait In Two Minutes Fei Gao 1,2 , Jingjie Zhu 1,2 , Zeyuan Yu 1 , Peng Li 1,, Tao Wang 1,3 , Senior Member, IEEE Abstract— Significant progress has been made with artistic robots. However, existing robots fail to produce high-quality portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid portrait, and then draw it on paper within two minutes averagely. At the heart of our system is a novel portrait synthe- sis algorithm based on deep learning. Innovatively, we employ a self-consistency loss, which makes the algorithm capable of generating continuous and smooth brush-strokes. Besides, we propose a componential-sparsity constraint to reduce the num- ber of brush-strokes over insignificant areas. We also implement a local sketch synthesis algorithm, and several pre- and post- processing techniques to deal with the background and details. The portrait produced by our algorithm successfully captures individual characteristics by using a sparse set of continuous brush-strokes. Finally, the portrait is converted to a sequence of trajectories and reproduced by a 3-degree-of-freedom robotic arm. The whole portrait drawing robotic system is named AiSketcher. Extensive experiments show that AiSketcher can produce considerably high-quality sketches for a wide range of pictures, including faces in-the-wild and universal images of arbitrary content. To our best knowledge, AiSketcher is the first portrait drawing robot that uses deep learning techniques. AiSketcher has attended a quite number of exhibitions and shown remarkable performance under diverse circumstances. I. I NTRODUCTION The ability of robots to create high-quality artworks is popularly considered a benchmark of progress in Artificial Intelligence (AI). Reaserchers have contributed great efforts to develop artistic robots, which can draw sketch portraits [1], colourful pictures [2], watercolors [3], etc. Some of them have attended various exhibitions and shown impressive performance [4], [5]. Drawing colourful images typically costs a robot several hours to finish it [3]. In contrast, a portrait typically contains a parse set of graphical elements (e.g., lines) and can be finished in a short time. Portrait drawing robots thus allow active interactions between robots and normal consumers. In recent years, portrait drawing robots has attracted a lot of attention. Existing robots typically produce portraits by means of low-level image processing [1],. They are unable to draw high-quality portraits, especially for faces in-the- wild, i.e. faces presenting extreme poses, expressions, or occlusions, etc. Portraits have a highly abstract style. Even 1 Intelligent Computing Research Center, Advanced Institute of Infor- mation Technology (AIIT), Peking University, Hangzhou 311215, China. [email protected], [email protected] 2 School of Computer Science and Technology, Hangzhou Di- anzi University, Hangzhou 310018, China. [email protected], [email protected] 3 Department of Computer Science and Technology, Peking University, Beijing 100871, China. [email protected] Corresponding Author. for artists, portraits drawing relies on professional training and experience [6]. Therefore, it is still a complicated and elaborate task for robots to draw high-quality portraits. The other critical challenge is to balance the ”vividness” of a sketch and the time-budget for drawing it. Vivid sketches can please human users, while limited time-budget would avoid their impatience. However, the vividness and the drawing time contradict each other in practice. Vividness is correlated with details about characteristics in a given face. Generating more details in a portrait generally improves the vividness, but prolongs the drawing procedure. Conversely, a portrait with a sparse set of elements can be quickly drawn but may fail to capture individualities. To address the above challenges, in this work, we develop a novel portrait drawing robot by means of deep neural networks [7]. The proposed method is inspired by the great success of neural style transfer (NST) in creating various types of artworks [8]. To our best knowledge, no drawing robots have been developed based on these algorithms. Besides, preliminary experiments show that these algorithms cannot produce continuous and smooth brush-strokes. In this work, we propose a novel portrait synthesis al- gorithm for drawing robots. Specially, we propose two novel objectives for generating sparse and continuous brush- strokes. First, we enforce the algorithm capable of recon- structing real sketches (of arbitrary content) by using a self- consistency loss. This constraint considerably improves the continuity and realism of synthesised brush-strokes. Second, we apply a sparsity constraint to regions which are insignif- icant for characterizing individualities. The corresponding componential-sparsity loss significantly reduces the number of brush-strokes without degrading the quality of a portrait. Here, we use face parsing masks to represent facial compo- nents. We also implement a specific local synthesis algorithm as well as several pre- and post-processing techniques to deal with the background and details. Finally, we develop a drawing robot, namely AiSketcher, by using a 3-degree-of-freedom (DOF) robotic arm. The synthesised sketch is converted into a sequence of trajectories that the robot reproduces on paper or other flat materials. In the robotic system, we implement several human-robot inter- action techniques to facilitate uses of AiSketcher. Extensive experimental results show that AiSketcher significantly out- perform previous state-of-the-art. Besides, AiSketcher works well over a wide range of images, including faces in-the-wild and universal images of arbitrary content. AiSketcher has attended a quite number of exhibitions and shown remarkable as well as robust performance under diverse circumstances. In summary, we make the following contributions: arXiv:2005.05526v1 [cs.RO] 12 May 2020

Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

Making Robots Draw A Vivid Portrait In Two Minutes

Fei Gao12 Jingjie Zhu12 Zeyuan Yu1 Peng Li1dagger Tao Wang13 Senior Member IEEE

Abstractmdash Significant progress has been made with artisticrobots However existing robots fail to produce high-qualityportraits in a short time In this work we present a drawingrobot which can automatically transfer a facial picture to avivid portrait and then draw it on paper within two minutesaveragely At the heart of our system is a novel portrait synthe-sis algorithm based on deep learning Innovatively we employa self-consistency loss which makes the algorithm capable ofgenerating continuous and smooth brush-strokes Besides wepropose a componential-sparsity constraint to reduce the num-ber of brush-strokes over insignificant areas We also implementa local sketch synthesis algorithm and several pre- and post-processing techniques to deal with the background and detailsThe portrait produced by our algorithm successfully capturesindividual characteristics by using a sparse set of continuousbrush-strokes Finally the portrait is converted to a sequence oftrajectories and reproduced by a 3-degree-of-freedom roboticarm The whole portrait drawing robotic system is namedAiSketcher Extensive experiments show that AiSketcher canproduce considerably high-quality sketches for a wide rangeof pictures including faces in-the-wild and universal images ofarbitrary content To our best knowledge AiSketcher is thefirst portrait drawing robot that uses deep learning techniquesAiSketcher has attended a quite number of exhibitions andshown remarkable performance under diverse circumstances

I INTRODUCTION

The ability of robots to create high-quality artworks ispopularly considered a benchmark of progress in ArtificialIntelligence (AI) Reaserchers have contributed great effortsto develop artistic robots which can draw sketch portraits[1] colourful pictures [2] watercolors [3] etc Some ofthem have attended various exhibitions and shown impressiveperformance [4] [5] Drawing colourful images typicallycosts a robot several hours to finish it [3] In contrast aportrait typically contains a parse set of graphical elements(eg lines) and can be finished in a short time Portraitdrawing robots thus allow active interactions between robotsand normal consumers

In recent years portrait drawing robots has attracted a lotof attention Existing robots typically produce portraits bymeans of low-level image processing [1] They are unableto draw high-quality portraits especially for faces in-the-wild ie faces presenting extreme poses expressions orocclusions etc Portraits have a highly abstract style Even

1Intelligent Computing Research Center Advanced Institute of Infor-mation Technology (AIIT) Peking University Hangzhou 311215 Chinamhxyyuzeyuan163com pliaiitorgcn

2School of Computer Science and Technology Hangzhou Di-anzi University Hangzhou 310018 China gaofeihdueducnjjzhuaiitorgcn

3Department of Computer Science and Technology Peking UniversityBeijing 100871 China wangtaopkueducn

daggerCorresponding Author

for artists portraits drawing relies on professional trainingand experience [6] Therefore it is still a complicated andelaborate task for robots to draw high-quality portraits

The other critical challenge is to balance the rdquovividnessrdquo ofa sketch and the time-budget for drawing it Vivid sketchescan please human users while limited time-budget wouldavoid their impatience However the vividness and thedrawing time contradict each other in practice Vividness iscorrelated with details about characteristics in a given faceGenerating more details in a portrait generally improves thevividness but prolongs the drawing procedure Converselya portrait with a sparse set of elements can be quickly drawnbut may fail to capture individualities

To address the above challenges in this work we developa novel portrait drawing robot by means of deep neuralnetworks [7] The proposed method is inspired by the greatsuccess of neural style transfer (NST) in creating varioustypes of artworks [8] To our best knowledge no drawingrobots have been developed based on these algorithmsBesides preliminary experiments show that these algorithmscannot produce continuous and smooth brush-strokes

In this work we propose a novel portrait synthesis al-gorithm for drawing robots Specially we propose twonovel objectives for generating sparse and continuous brush-strokes First we enforce the algorithm capable of recon-structing real sketches (of arbitrary content) by using a self-consistency loss This constraint considerably improves thecontinuity and realism of synthesised brush-strokes Secondwe apply a sparsity constraint to regions which are insignif-icant for characterizing individualities The correspondingcomponential-sparsity loss significantly reduces the numberof brush-strokes without degrading the quality of a portraitHere we use face parsing masks to represent facial compo-nents We also implement a specific local synthesis algorithmas well as several pre- and post-processing techniques to dealwith the background and details

Finally we develop a drawing robot namely AiSketcherby using a 3-degree-of-freedom (DOF) robotic arm Thesynthesised sketch is converted into a sequence of trajectoriesthat the robot reproduces on paper or other flat materials Inthe robotic system we implement several human-robot inter-action techniques to facilitate uses of AiSketcher Extensiveexperimental results show that AiSketcher significantly out-perform previous state-of-the-art Besides AiSketcher workswell over a wide range of images including faces in-the-wildand universal images of arbitrary content AiSketcher hasattended a quite number of exhibitions and shown remarkableas well as robust performance under diverse circumstances

In summary we make the following contributions

arX

iv2

005

0552

6v1

[cs

RO

] 1

2 M

ay 2

020

bull AiSketcher is the first portrait drawing robot that usesdeep learning techniques

bull We propose a self-consistency loss to make the algo-rithm produce continuous and realistic brush-strokes

bull We propose a componential-sparsity loss to balance thevividness and time-budget

bull Our portrait synthesis algorithm doesnrsquot need a large-scale photo-sketch pairs for training

bull Our robotic system works well for faces in-the-wild anda wide range of universal images

The paper is organized as follows Section II providesa brief description of related work Section III presents anoverview on the whole system In Section IV and Section Vthe portrait synthesis algorithms and the robotic system aredetailed respectively Extensive experiments are presented inSection VI Finally we conclude this work in Section VII

II RELATED WORK

In this section we briefly introduce some related workincluding portrait drawing robots face sketch synthesis deepneural style transfer and generative adversarial network

A Portrait Drawing RobotsA number of efforts have been devoted to develop portrait

drawing robots For example Calinon et al [9] developa humanoid robot drawing binary facial pictures Lin etal [10] use center-off algorithm and median filtering togenerate a portrait and then refine it based on low-leveltextural features The resulting portraits are unduly simpleand roughly present geometric outlines of human facesMohammed et al [11] use edge detection and counter tracingto synthesis sketches and plan the path Additionally Lau etal [12] explore a portrait robot by means of point-to-pointdrawing instead of line-drawing Tresset et al [13] extractGabor kernels to extract salient lines from a detected faceand finish drawing by means of visual feedback

Recenlty Xue and Liu [14] use facial key-points and amathematical morphology algorithm to generate a primalsketch Then they generate some steaks to represent hairsbased on textural features Similarly Gao et al [1] usefacial key-points to synthesis a outline portrait and thenuse random points and lines to represent eyebrows andhair The portraits produced by these methods typicallypresent stiff steaks Besides these methods rely heavily onthe performance of key-points detection algorithms whichunfortunately donrsquot work well for faces in-the-wild Finallythe generated strokes are not in an artistic style

B Deep Neural Style TransferNeural Style Transfer (NST) means using Convolutional

Neural Networks (CNNs) to render a content image in atarget style The problem of style transfer is closely relatedto texture synthesis and transfer [15] Early approachestypically rely on low-level statistics Recently Gatys et al[16] first use CNNs to solve this problem and have leadto a trending developments both in academic literature andindustrial applications [8] Inspired by the success of NSTwe develop a novel portrait synthesis algorithm in this work

C Generative Adversarial NetworksRecently Generative Adversarial Networks (GANs) [17]

[18] have show inspiring results in generating facial picturesnatural images facial sketches paintings etc In GAN a neu-ral network works as generator G which learns to transfera source image x to a target image y The other networkworks as discriminator D which aims at distinguishing ywith the generated image G(x) [18] G and D are alternatelyoptimized in an adversarial manner Finally G would becapable of producing images in the style of y There havebeen a huge number of algorithms and applications of GANsPlease refer to [19] for a comprehensive study

Recently Yi et al proposed a APDrawingGAN model forportrait drawing [6] However the resulting sketches presenttoo many dilated details and black areas which wouldsignificantly increase the drawing time of a robot BesidesAPDrawingGAN cannot produce high-quality portraits forfaces in-the-wild Corresponding experimental results will beshown in Section VI-C

D Facial Sketch SynthesisAnother close topic to our work is face sketch synthesis

(FSS) FSS generally means synthesizing a pencil-drawingsketch based on an input face photo [20] Recently re-searchers develop some NST based [21] or GAN based[22] methods which can produce fantastic pencil-drawingsketches However pencil-drawing sketches typically con-sume a long time to draw They are thus unsuitable for real-time drawing robots

III ARCHITECTURE OF THE SYSTEM

In this section we overview the architecture of the roboticportrait drawing system It is composed of both software(eg the algorithms for portrait synthesis) and hardwarecomponents (eg the robotic arm and remote server) Fig1 shows an overview on the system

First a facial picture is uploaded by users through a socialmedia account and fed into the portrait synthesis algorithmsas explained in Section IV Afterwards the resulting portraitis converted to a sequence of trajectories by the pathplanning module Finally the list of commands is deployedon a NVIDIA Jetson Nano board to control the robotic armwhich reproduces the portrait on paper or other flat materialsAdditionally we provide a mobile device for users to choosea preferred portrait from a number of alternatives and to startthe drawing procedure

IV ALGORITHMS FOR PORTRAIT SYNTHESIS

The pipeline of our portrait synthesis algorithms is asshown in Fig 1 Given a facial picture x we first pre-processit by means of face detection face alignment face parsingand background removal The resulting picture is denotedby x Afterwards we feed x into a global sketch synthesisalgorithm which produces a primary portrait Additionallywe use a local sketch synthesis algorithm to specificallysynthesis eyebrows We then fuse the globally and locallysynthesised ketches and deal with details in the post-processing stage

Social Media Account

Users (START)

Transmit

Remote Server

Start (YN)TrajectoryD

isplay

Mobile device

NVIDIA Jetson Nano

board

Upload

Users (END)

Robotic ArmPortrait Synthesis Algorithms

Pre-Pocessing

Face Dection

Face Parsing

Background Removal

Face Alighnment

Path Planning

Skeleton

Filling Black Marks

Edge Detction

Synthesised Sketch

Input Photo

Post-Processing

Image Binarization

Chin Renewal

Eyeball Renewal

Fusing Global amp Local Sketches

Global Sketch Synthesisvia NST

Local Sketch Synthesisvia GAN

Fig 1 Overview of our portrait drawing robotic system AiSketcher

(b)

input

(c)

global eye hair eyebrow

(d)(a) (e)

Fig 2 Illustration of image pre-processing (a) Input picture (b) alignedpicture (c) background removal (d) face parsing mask M and (e) com-posentional sparsity mask M prime

A Pre-processing

We use the following approaches to align an input pictureand to deal with the background

Face Detection First we use the dlib toolbox inOpenCV [23] to detect the face in a given image as well as68 key-points in the face In default settings if more thanone faces are detected we choose the largest one If no faceis detected we feed the original image into the global sketchsynthesis algorithm to produce a final output

Face Alignment Afterwards we geometrically align thefacial picture by affine transformation relying on centersof eyes The aligned image is automatically cropped to512times 512 pixels (11) or 512times 680 pixels (34) accordingto the size of the original picture Fig 2b shows an alignedexample

Face Parsing Then we use face parsing to representfacial components Specially we use MaskGAN [24] tosegment the aligned picture into 19 components includingbackground hair two eyebrows two eyes nose two lipsneck clothes etc Fig 2d illustrates a parsing mask M where facial components are distinguished by colors

Background Removal Finally based on the parsingmask we replace the background with white pixels (Fig2(c)) so no elements would be generated in the background

B Global Sketch Synthesis

We first develop a global sketch synthesis algorithm totransfer the facial picture x to a sketch Here we adoptthe algorithm propoed in [25] as our base model due toits inspiring performance in the arbitrary style transfer taskHowever this algorithm cannot generalize sparse and contin-uous brush-strokes for sketch synthesis We thus propose twonovel loss functions to boost the quality of sketch synthesisThe framework of our global sketch synthesis network is asshown in Fig 3 Details will be presented bellow

Discrim

intorLocal style transfer Network

Con

v

recL

GANLC

onv

Con

v

Con

v

Con

v

Con

v

contentLVG

GEncoder D

ecod

er

global style transfer Network

consistL

sparseL

VG

GEncoder

AdaINstyleL

Fig 3 Pipeline of the global sketch synthesis algorithm

Network Architecture The global synthesis algorithmtakes the pre-processed face photo x and a style image sas inputs and outputs a synthesised sketch Here we use areal sketch s (of arbitrary content) as the style image

We use the same network architecture as that used in [25]Specially we use the first few layers of a pre-trained VGG-19 [26] as an encoder f Both the content and style imagesare fed into f and converted to feature maps f(x) and f(s)Afterwards an AdaIN layer produces the target feature mapst by

t = σ(f(s))f(x)minus micro(x)σ(f(x))

+ micro(f(s)) (1)

where micro(middot) and σ(middot) denote the mean and variance offeatures Finally t is fed into a decoder g generating a sketchportrait T (x s)

T (x s) = g(t) (2)

g includes nine convolutional layers and three nearest up-sampling layers Reflection padding is applied to each layer

In the implementation f is fixed and g is randomlyinitialized We use a pre-trained VGG-19 to compute thefollowing four loss functions for training g

Content Loss The content loss encourages a synthesisedsketch T (x s) has the same content as the input facial picturex It is formulated as the Euclidean distance between thetarget features t and the features of the portrait T (x s)

Lcontent = f(T (x s))minus t2 (3)

Style Loss In AdaIN the style is represented by the meanand standard deviation of deep features The style loss iscorrespondingly formulated as

Lstyle =

Lsumi=1

micro(φi(T (x s)))minus micro(φi(s))2

+

Lsumi=1

σ(φi(T (x s)))minus σ(φi(s))2 (4)

where φi(middot) denotes the outputs over the i-th layer in VGG-19 In the experiments we use relu1 1 relu2 1 relu3 1and relu4 1 layers to compute the style loss

Self-consistency Loss We argue that the learned mappingfunction T (middot) should be self-consistent For a real sketchimage s the algorithm should be able to reconstruct it ieT (s s) = g(f(s)) asymp s The corresponding self-consistencyloss is formulated as

Lconsist = T (s s)minus s2 (5)

Experiments show that the self-consistency loss significantlyimproves the continuity of synthesised brush-strokes

Compositional Sparsity Finally we employ a componen-tial sparsity to constrain the network producing a sketch witha sparse set of brush-strokes High-quality portraits mainlypresent brush-strokes for primary components or geometricboundaries We therefore merely add a sparse constraint torelatively unnecessary regions such as the background hairskin and body The corresponding compostional-sparsityloss is formulated as

Lsparse = Mprime (1minus T (x s))1 (6)

where denotes the element-product operation M prime is abinary sparsity mask derived from the parsing mask M Specially M prime is of the same size as M and x In M prime pixelscorresponding to boundaries eyes eyebrows and lips areassigned 0 All the rest positions are assigned 1 in M primeand encouraged to present white color in T (x s) Sincethe predicted parsing mask might be imprecise we slightlyenlarge the black areas in M prime by image erosion The finalcomposional-sparsity mask is as illustrated in Fig 2(g)

Using a global sparsity ie M prime is full of 1 also leads to aparse set of elements in T (x s) However it reduces the ele-ments about individual characteristics as well Correspondingexperiments will be presented in Section VI-F

Full Objective Our full objective is

L = λ1Lcontent + λ2Lstyle + λ3Lconsist + λ4Lsparse (7)

where λ1 λ2 λ3 λ4 are weighting factors

C Local Sketch Synthesis

By carefully examining the globally synthesised portraitswe find eyebrows are not well generated occasionally Forlight-colored eyebrows the algorithm might produce noelements in this region For eyebrows with non-uniformcolors the algorithm may produce a line with bifurcations

To address this problem we additionally use a GAN tospecifically synthesis eyebrows Here we reproduce the localGAN in APDrawingGAN [6] The local sketch synthesisnetwork follows an U-Net architecture [18] and trained on asmall set of eyebrow photo-sketch pairs [6]

D Post-processing

Finally we implement a number of post-processing tech-niques to deal with the background and details

Image Binarization The output of both the global and lo-cal synthesis algorithms are gray-scale sketches We convertthem to binary images based on a threshold of 05

Fusing Global and Local Sketches Before fusion wethin the locally synthesised eyebrows and remove delicateeyelashes [6] by using image dilation Afterwards we re-place the eyebrows in the globally synthesised portrait withthese dilated eyebrows We here use the face parsing maskM to determine areas of eyebrows

Eyeball Renewal Occasionally our algorithm only pro-duce a circle to represent an eyeball Although this hardlyharm the final sketch drawn by the robot we propose amethod to renewal an eyeball Specially we add a black spotto an eye center if it is blank We determine the position ofeye center based on facial key-points detection

Style Fusion Given a facial picture the global synthesisalgorithm produces diverse sketches while using differentstyle images Such sketches might diverse in the numberwidth and continuity of brush-strokes It is therefore providea chance to fuse these sketches based on the facial parsingmask In our implementation we use a style image tosynthesis the primal sketch and use another style image tosynthesis the hair Afterwards we replace the hair region inthe former sketch by those in the latter one

V THE ROBOTIC SYSTEM

In this section we introduce the hardware componentsand the path planning module which implement the abovesynthesised portrait into a real sketch

A Hardware

We use a commercial robotic arm (3 DOFs) ie uArm[28] to perform portrait drawing We choose this roboticarm based on a trade-off between the performance andthe cost Using better robots might boost the performancebut will dramatically increase the cost for developing aportrait drawing robot The workspace size on paper is about160 mmtimes 160 mm In this work we have adopted writingbrushes for portraits drawing We integrate the brush with therobotic arm through plastic supports which are manufacturedby means of 3D printing

To improve the mobility of the robot we have imple-mented all the portrait synthesis and path planning algorithmson a remote server In this way the terminal of our drawingrobot is composed of a robotic arm a NVIDIA Jetson Nanoboard and a mobile device The light weight and smallvolume of our robotic terminal dramatically ease its applica-tions Besides by using a NVIDIA GeForce GTX 1060 GPUon the remote server we can transfer an input picture to asketch in one second averagely Such a high-speed synthesismeets the requirements of real-time applications

B Path Planning

Artists typically draw graphical lines along geometri-cal boundaries and render black marks for eyebrows andeyeballs To imitate this behaviour we use the followingmethods to convert a synthesised portrait to a trajectory

We first extract the skeletons of a portrait image The por-trait typically contain brush-strokes with non-uniform widthIt is time-consuming for a robot to elaborately renderingsuch details By extracting skeletons of brush-strokes wereduce the elements for a robot to reproduce We implementthis by means of mathematical morphology Afterwardswe search a sequence of points on skeletons along thegradient orientations estimated by a Canny edge detectorHere breadth-first search is used Finally to fill the blackmarks representing eyeballs and eyebrows we draw theclosed loop of edge from outside to inside iteratively

VI EXPERIMENTS

In this section we first introduce implementation detailsand then present a number of experiments to verify theperformance of our portrait drawing robot

A Implementation Details

Datasets First we need a large set of content imagesand a number of style images to train the global sketchsynthesis algorithm To this end we randomly select 1000low-resolution facial pictures from the CelebA dataset [27]and 1000 high-resolution facial pictures from the CelebA-HQ dataset [24] as the set of content images All thesepictures are resized into 512times512 pixels We also download20 real sketches of arbitrary content from the Web as styleimages We randomly choose 95 of facial pictures fortraining and use the rest for validation

Second to train our local sketch synthesis algorithm weextract 280 eyebrow photo-sketch pairs the dataset releasedby [6] Each sample is of 60 times 60 pixels We randomlychoose 210 samples for training and the rest 70 samplesfor validation

Finally we download about 200 images includingcelebrity faces and universal images of arbitrary contentfrom the Web for testing These facial pictures present facesin-the-wild which are diverse in poses expressions etc Weapply our AiSketcher to these images so as to evaluate its per-formance in practical applications The corresponding resultshave been released at httpsricelllgithubioAiSketcher

Training In the training stage we use the Adam optimizerwith a learning rate of 1eminus 4 and a batch size of 8 for boththe global and local sketch synthesis algorithms Besides tooptimize the global algorithm we have λ1 = λ2 = λ3 = 1and λ4 = 10 in Equ7 and run for 160000 iterationsTo optimize the local algorithm we alternate between onegradient descent step on D then one step on G We usethe Adam optimizer with β = 05 and run for different600 epochs We train our algorithms on a single NVIDIAGeForce GTX 1080Ti GPU It takes about 22 hours and 1hour to train the global and the local algorithms respectively

B Comparison with State-of-the-art

In this part we compare our portrait synthesis algorithmwith existing work about portrait drawing robots ie [10][14] and [1] and a state-of-the-art (SOTA) portrait synthesismethod ie APDrawingGAN [6] For this purpose we col-lect the pictures and synthesised portraits presented in [10][14] and [1] Afterwards we apply the officially releasedAPDrawingGAN and our learn sketch synthesis algorithmto these facial pictures respectively

Inpu

t Ph

otos

Orig

inal

re

sults

APD

raw

ingG

AN

Our

s

Examples in [11] Examples in [2] Examples in [4]

Fig 4 Comparison with existing portrait drawing methods Input imagesare colected from [10] [14] and [1] The second row shows the correspond-ing results in the original papers The third and bottom rows show sketchessynthesised by APDrawingGAN [6] and our method respectively

As shown in Fig 4 the portraits produced by [10] are oversimple and present sinuate lines Portraits produced by both[14] and [1] represent facial structures by stiff and unrealisticlines Besides both of them cannot properly synthesis thehair The portraits generated by APDrawingGAN capture thedistinctive appearance of a person but present low-qualityelements Besides APDrawingGAN cannot produce portraitswith a sparse set of continuous lines and marks

In contrast the portraits synthesised by our algorithmsuccessfully capture individual characteristics and feelingsby using a sparse set of realistic brush-strokes Moreoverour algorithm produces a primal sketch for the area of clothswhich further improve the vividness of facial portraits Notethat these facial pictures are low-resolution and blurringResults shown in Fig 4 demonstrate the remarkable capacityof our algorithm in generating high-quality portraits

C Qualitative Evaluation

We further apply our robotic system to a wide rangeof faces in-the-wild Such faces are diverse in illuminationcondition pose expression etc and may contain occlusionsFig 5 show some examples Here we pre-process photos inFig 5a -5d before feeding them into APDrawingGAN or ourAiSketcher While the photos in Fig 5e -5h are directly inputinto APDrawingGAN or our global synthesis algorithm forproducing portraits

Similarly APDrawingGAN doesnrsquot work properly forthese photos In contrast our AiSketcher considerably and

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 2: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

bull AiSketcher is the first portrait drawing robot that usesdeep learning techniques

bull We propose a self-consistency loss to make the algo-rithm produce continuous and realistic brush-strokes

bull We propose a componential-sparsity loss to balance thevividness and time-budget

bull Our portrait synthesis algorithm doesnrsquot need a large-scale photo-sketch pairs for training

bull Our robotic system works well for faces in-the-wild anda wide range of universal images

The paper is organized as follows Section II providesa brief description of related work Section III presents anoverview on the whole system In Section IV and Section Vthe portrait synthesis algorithms and the robotic system aredetailed respectively Extensive experiments are presented inSection VI Finally we conclude this work in Section VII

II RELATED WORK

In this section we briefly introduce some related workincluding portrait drawing robots face sketch synthesis deepneural style transfer and generative adversarial network

A Portrait Drawing RobotsA number of efforts have been devoted to develop portrait

drawing robots For example Calinon et al [9] developa humanoid robot drawing binary facial pictures Lin etal [10] use center-off algorithm and median filtering togenerate a portrait and then refine it based on low-leveltextural features The resulting portraits are unduly simpleand roughly present geometric outlines of human facesMohammed et al [11] use edge detection and counter tracingto synthesis sketches and plan the path Additionally Lau etal [12] explore a portrait robot by means of point-to-pointdrawing instead of line-drawing Tresset et al [13] extractGabor kernels to extract salient lines from a detected faceand finish drawing by means of visual feedback

Recenlty Xue and Liu [14] use facial key-points and amathematical morphology algorithm to generate a primalsketch Then they generate some steaks to represent hairsbased on textural features Similarly Gao et al [1] usefacial key-points to synthesis a outline portrait and thenuse random points and lines to represent eyebrows andhair The portraits produced by these methods typicallypresent stiff steaks Besides these methods rely heavily onthe performance of key-points detection algorithms whichunfortunately donrsquot work well for faces in-the-wild Finallythe generated strokes are not in an artistic style

B Deep Neural Style TransferNeural Style Transfer (NST) means using Convolutional

Neural Networks (CNNs) to render a content image in atarget style The problem of style transfer is closely relatedto texture synthesis and transfer [15] Early approachestypically rely on low-level statistics Recently Gatys et al[16] first use CNNs to solve this problem and have leadto a trending developments both in academic literature andindustrial applications [8] Inspired by the success of NSTwe develop a novel portrait synthesis algorithm in this work

C Generative Adversarial NetworksRecently Generative Adversarial Networks (GANs) [17]

[18] have show inspiring results in generating facial picturesnatural images facial sketches paintings etc In GAN a neu-ral network works as generator G which learns to transfera source image x to a target image y The other networkworks as discriminator D which aims at distinguishing ywith the generated image G(x) [18] G and D are alternatelyoptimized in an adversarial manner Finally G would becapable of producing images in the style of y There havebeen a huge number of algorithms and applications of GANsPlease refer to [19] for a comprehensive study

Recently Yi et al proposed a APDrawingGAN model forportrait drawing [6] However the resulting sketches presenttoo many dilated details and black areas which wouldsignificantly increase the drawing time of a robot BesidesAPDrawingGAN cannot produce high-quality portraits forfaces in-the-wild Corresponding experimental results will beshown in Section VI-C

D Facial Sketch SynthesisAnother close topic to our work is face sketch synthesis

(FSS) FSS generally means synthesizing a pencil-drawingsketch based on an input face photo [20] Recently re-searchers develop some NST based [21] or GAN based[22] methods which can produce fantastic pencil-drawingsketches However pencil-drawing sketches typically con-sume a long time to draw They are thus unsuitable for real-time drawing robots

III ARCHITECTURE OF THE SYSTEM

In this section we overview the architecture of the roboticportrait drawing system It is composed of both software(eg the algorithms for portrait synthesis) and hardwarecomponents (eg the robotic arm and remote server) Fig1 shows an overview on the system

First a facial picture is uploaded by users through a socialmedia account and fed into the portrait synthesis algorithmsas explained in Section IV Afterwards the resulting portraitis converted to a sequence of trajectories by the pathplanning module Finally the list of commands is deployedon a NVIDIA Jetson Nano board to control the robotic armwhich reproduces the portrait on paper or other flat materialsAdditionally we provide a mobile device for users to choosea preferred portrait from a number of alternatives and to startthe drawing procedure

IV ALGORITHMS FOR PORTRAIT SYNTHESIS

The pipeline of our portrait synthesis algorithms is asshown in Fig 1 Given a facial picture x we first pre-processit by means of face detection face alignment face parsingand background removal The resulting picture is denotedby x Afterwards we feed x into a global sketch synthesisalgorithm which produces a primary portrait Additionallywe use a local sketch synthesis algorithm to specificallysynthesis eyebrows We then fuse the globally and locallysynthesised ketches and deal with details in the post-processing stage

Social Media Account

Users (START)

Transmit

Remote Server

Start (YN)TrajectoryD

isplay

Mobile device

NVIDIA Jetson Nano

board

Upload

Users (END)

Robotic ArmPortrait Synthesis Algorithms

Pre-Pocessing

Face Dection

Face Parsing

Background Removal

Face Alighnment

Path Planning

Skeleton

Filling Black Marks

Edge Detction

Synthesised Sketch

Input Photo

Post-Processing

Image Binarization

Chin Renewal

Eyeball Renewal

Fusing Global amp Local Sketches

Global Sketch Synthesisvia NST

Local Sketch Synthesisvia GAN

Fig 1 Overview of our portrait drawing robotic system AiSketcher

(b)

input

(c)

global eye hair eyebrow

(d)(a) (e)

Fig 2 Illustration of image pre-processing (a) Input picture (b) alignedpicture (c) background removal (d) face parsing mask M and (e) com-posentional sparsity mask M prime

A Pre-processing

We use the following approaches to align an input pictureand to deal with the background

Face Detection First we use the dlib toolbox inOpenCV [23] to detect the face in a given image as well as68 key-points in the face In default settings if more thanone faces are detected we choose the largest one If no faceis detected we feed the original image into the global sketchsynthesis algorithm to produce a final output

Face Alignment Afterwards we geometrically align thefacial picture by affine transformation relying on centersof eyes The aligned image is automatically cropped to512times 512 pixels (11) or 512times 680 pixels (34) accordingto the size of the original picture Fig 2b shows an alignedexample

Face Parsing Then we use face parsing to representfacial components Specially we use MaskGAN [24] tosegment the aligned picture into 19 components includingbackground hair two eyebrows two eyes nose two lipsneck clothes etc Fig 2d illustrates a parsing mask M where facial components are distinguished by colors

Background Removal Finally based on the parsingmask we replace the background with white pixels (Fig2(c)) so no elements would be generated in the background

B Global Sketch Synthesis

We first develop a global sketch synthesis algorithm totransfer the facial picture x to a sketch Here we adoptthe algorithm propoed in [25] as our base model due toits inspiring performance in the arbitrary style transfer taskHowever this algorithm cannot generalize sparse and contin-uous brush-strokes for sketch synthesis We thus propose twonovel loss functions to boost the quality of sketch synthesisThe framework of our global sketch synthesis network is asshown in Fig 3 Details will be presented bellow

Discrim

intorLocal style transfer Network

Con

v

recL

GANLC

onv

Con

v

Con

v

Con

v

Con

v

contentLVG

GEncoder D

ecod

er

global style transfer Network

consistL

sparseL

VG

GEncoder

AdaINstyleL

Fig 3 Pipeline of the global sketch synthesis algorithm

Network Architecture The global synthesis algorithmtakes the pre-processed face photo x and a style image sas inputs and outputs a synthesised sketch Here we use areal sketch s (of arbitrary content) as the style image

We use the same network architecture as that used in [25]Specially we use the first few layers of a pre-trained VGG-19 [26] as an encoder f Both the content and style imagesare fed into f and converted to feature maps f(x) and f(s)Afterwards an AdaIN layer produces the target feature mapst by

t = σ(f(s))f(x)minus micro(x)σ(f(x))

+ micro(f(s)) (1)

where micro(middot) and σ(middot) denote the mean and variance offeatures Finally t is fed into a decoder g generating a sketchportrait T (x s)

T (x s) = g(t) (2)

g includes nine convolutional layers and three nearest up-sampling layers Reflection padding is applied to each layer

In the implementation f is fixed and g is randomlyinitialized We use a pre-trained VGG-19 to compute thefollowing four loss functions for training g

Content Loss The content loss encourages a synthesisedsketch T (x s) has the same content as the input facial picturex It is formulated as the Euclidean distance between thetarget features t and the features of the portrait T (x s)

Lcontent = f(T (x s))minus t2 (3)

Style Loss In AdaIN the style is represented by the meanand standard deviation of deep features The style loss iscorrespondingly formulated as

Lstyle =

Lsumi=1

micro(φi(T (x s)))minus micro(φi(s))2

+

Lsumi=1

σ(φi(T (x s)))minus σ(φi(s))2 (4)

where φi(middot) denotes the outputs over the i-th layer in VGG-19 In the experiments we use relu1 1 relu2 1 relu3 1and relu4 1 layers to compute the style loss

Self-consistency Loss We argue that the learned mappingfunction T (middot) should be self-consistent For a real sketchimage s the algorithm should be able to reconstruct it ieT (s s) = g(f(s)) asymp s The corresponding self-consistencyloss is formulated as

Lconsist = T (s s)minus s2 (5)

Experiments show that the self-consistency loss significantlyimproves the continuity of synthesised brush-strokes

Compositional Sparsity Finally we employ a componen-tial sparsity to constrain the network producing a sketch witha sparse set of brush-strokes High-quality portraits mainlypresent brush-strokes for primary components or geometricboundaries We therefore merely add a sparse constraint torelatively unnecessary regions such as the background hairskin and body The corresponding compostional-sparsityloss is formulated as

Lsparse = Mprime (1minus T (x s))1 (6)

where denotes the element-product operation M prime is abinary sparsity mask derived from the parsing mask M Specially M prime is of the same size as M and x In M prime pixelscorresponding to boundaries eyes eyebrows and lips areassigned 0 All the rest positions are assigned 1 in M primeand encouraged to present white color in T (x s) Sincethe predicted parsing mask might be imprecise we slightlyenlarge the black areas in M prime by image erosion The finalcomposional-sparsity mask is as illustrated in Fig 2(g)

Using a global sparsity ie M prime is full of 1 also leads to aparse set of elements in T (x s) However it reduces the ele-ments about individual characteristics as well Correspondingexperiments will be presented in Section VI-F

Full Objective Our full objective is

L = λ1Lcontent + λ2Lstyle + λ3Lconsist + λ4Lsparse (7)

where λ1 λ2 λ3 λ4 are weighting factors

C Local Sketch Synthesis

By carefully examining the globally synthesised portraitswe find eyebrows are not well generated occasionally Forlight-colored eyebrows the algorithm might produce noelements in this region For eyebrows with non-uniformcolors the algorithm may produce a line with bifurcations

To address this problem we additionally use a GAN tospecifically synthesis eyebrows Here we reproduce the localGAN in APDrawingGAN [6] The local sketch synthesisnetwork follows an U-Net architecture [18] and trained on asmall set of eyebrow photo-sketch pairs [6]

D Post-processing

Finally we implement a number of post-processing tech-niques to deal with the background and details

Image Binarization The output of both the global and lo-cal synthesis algorithms are gray-scale sketches We convertthem to binary images based on a threshold of 05

Fusing Global and Local Sketches Before fusion wethin the locally synthesised eyebrows and remove delicateeyelashes [6] by using image dilation Afterwards we re-place the eyebrows in the globally synthesised portrait withthese dilated eyebrows We here use the face parsing maskM to determine areas of eyebrows

Eyeball Renewal Occasionally our algorithm only pro-duce a circle to represent an eyeball Although this hardlyharm the final sketch drawn by the robot we propose amethod to renewal an eyeball Specially we add a black spotto an eye center if it is blank We determine the position ofeye center based on facial key-points detection

Style Fusion Given a facial picture the global synthesisalgorithm produces diverse sketches while using differentstyle images Such sketches might diverse in the numberwidth and continuity of brush-strokes It is therefore providea chance to fuse these sketches based on the facial parsingmask In our implementation we use a style image tosynthesis the primal sketch and use another style image tosynthesis the hair Afterwards we replace the hair region inthe former sketch by those in the latter one

V THE ROBOTIC SYSTEM

In this section we introduce the hardware componentsand the path planning module which implement the abovesynthesised portrait into a real sketch

A Hardware

We use a commercial robotic arm (3 DOFs) ie uArm[28] to perform portrait drawing We choose this roboticarm based on a trade-off between the performance andthe cost Using better robots might boost the performancebut will dramatically increase the cost for developing aportrait drawing robot The workspace size on paper is about160 mmtimes 160 mm In this work we have adopted writingbrushes for portraits drawing We integrate the brush with therobotic arm through plastic supports which are manufacturedby means of 3D printing

To improve the mobility of the robot we have imple-mented all the portrait synthesis and path planning algorithmson a remote server In this way the terminal of our drawingrobot is composed of a robotic arm a NVIDIA Jetson Nanoboard and a mobile device The light weight and smallvolume of our robotic terminal dramatically ease its applica-tions Besides by using a NVIDIA GeForce GTX 1060 GPUon the remote server we can transfer an input picture to asketch in one second averagely Such a high-speed synthesismeets the requirements of real-time applications

B Path Planning

Artists typically draw graphical lines along geometri-cal boundaries and render black marks for eyebrows andeyeballs To imitate this behaviour we use the followingmethods to convert a synthesised portrait to a trajectory

We first extract the skeletons of a portrait image The por-trait typically contain brush-strokes with non-uniform widthIt is time-consuming for a robot to elaborately renderingsuch details By extracting skeletons of brush-strokes wereduce the elements for a robot to reproduce We implementthis by means of mathematical morphology Afterwardswe search a sequence of points on skeletons along thegradient orientations estimated by a Canny edge detectorHere breadth-first search is used Finally to fill the blackmarks representing eyeballs and eyebrows we draw theclosed loop of edge from outside to inside iteratively

VI EXPERIMENTS

In this section we first introduce implementation detailsand then present a number of experiments to verify theperformance of our portrait drawing robot

A Implementation Details

Datasets First we need a large set of content imagesand a number of style images to train the global sketchsynthesis algorithm To this end we randomly select 1000low-resolution facial pictures from the CelebA dataset [27]and 1000 high-resolution facial pictures from the CelebA-HQ dataset [24] as the set of content images All thesepictures are resized into 512times512 pixels We also download20 real sketches of arbitrary content from the Web as styleimages We randomly choose 95 of facial pictures fortraining and use the rest for validation

Second to train our local sketch synthesis algorithm weextract 280 eyebrow photo-sketch pairs the dataset releasedby [6] Each sample is of 60 times 60 pixels We randomlychoose 210 samples for training and the rest 70 samplesfor validation

Finally we download about 200 images includingcelebrity faces and universal images of arbitrary contentfrom the Web for testing These facial pictures present facesin-the-wild which are diverse in poses expressions etc Weapply our AiSketcher to these images so as to evaluate its per-formance in practical applications The corresponding resultshave been released at httpsricelllgithubioAiSketcher

Training In the training stage we use the Adam optimizerwith a learning rate of 1eminus 4 and a batch size of 8 for boththe global and local sketch synthesis algorithms Besides tooptimize the global algorithm we have λ1 = λ2 = λ3 = 1and λ4 = 10 in Equ7 and run for 160000 iterationsTo optimize the local algorithm we alternate between onegradient descent step on D then one step on G We usethe Adam optimizer with β = 05 and run for different600 epochs We train our algorithms on a single NVIDIAGeForce GTX 1080Ti GPU It takes about 22 hours and 1hour to train the global and the local algorithms respectively

B Comparison with State-of-the-art

In this part we compare our portrait synthesis algorithmwith existing work about portrait drawing robots ie [10][14] and [1] and a state-of-the-art (SOTA) portrait synthesismethod ie APDrawingGAN [6] For this purpose we col-lect the pictures and synthesised portraits presented in [10][14] and [1] Afterwards we apply the officially releasedAPDrawingGAN and our learn sketch synthesis algorithmto these facial pictures respectively

Inpu

t Ph

otos

Orig

inal

re

sults

APD

raw

ingG

AN

Our

s

Examples in [11] Examples in [2] Examples in [4]

Fig 4 Comparison with existing portrait drawing methods Input imagesare colected from [10] [14] and [1] The second row shows the correspond-ing results in the original papers The third and bottom rows show sketchessynthesised by APDrawingGAN [6] and our method respectively

As shown in Fig 4 the portraits produced by [10] are oversimple and present sinuate lines Portraits produced by both[14] and [1] represent facial structures by stiff and unrealisticlines Besides both of them cannot properly synthesis thehair The portraits generated by APDrawingGAN capture thedistinctive appearance of a person but present low-qualityelements Besides APDrawingGAN cannot produce portraitswith a sparse set of continuous lines and marks

In contrast the portraits synthesised by our algorithmsuccessfully capture individual characteristics and feelingsby using a sparse set of realistic brush-strokes Moreoverour algorithm produces a primal sketch for the area of clothswhich further improve the vividness of facial portraits Notethat these facial pictures are low-resolution and blurringResults shown in Fig 4 demonstrate the remarkable capacityof our algorithm in generating high-quality portraits

C Qualitative Evaluation

We further apply our robotic system to a wide rangeof faces in-the-wild Such faces are diverse in illuminationcondition pose expression etc and may contain occlusionsFig 5 show some examples Here we pre-process photos inFig 5a -5d before feeding them into APDrawingGAN or ourAiSketcher While the photos in Fig 5e -5h are directly inputinto APDrawingGAN or our global synthesis algorithm forproducing portraits

Similarly APDrawingGAN doesnrsquot work properly forthese photos In contrast our AiSketcher considerably and

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 3: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

Social Media Account

Users (START)

Transmit

Remote Server

Start (YN)TrajectoryD

isplay

Mobile device

NVIDIA Jetson Nano

board

Upload

Users (END)

Robotic ArmPortrait Synthesis Algorithms

Pre-Pocessing

Face Dection

Face Parsing

Background Removal

Face Alighnment

Path Planning

Skeleton

Filling Black Marks

Edge Detction

Synthesised Sketch

Input Photo

Post-Processing

Image Binarization

Chin Renewal

Eyeball Renewal

Fusing Global amp Local Sketches

Global Sketch Synthesisvia NST

Local Sketch Synthesisvia GAN

Fig 1 Overview of our portrait drawing robotic system AiSketcher

(b)

input

(c)

global eye hair eyebrow

(d)(a) (e)

Fig 2 Illustration of image pre-processing (a) Input picture (b) alignedpicture (c) background removal (d) face parsing mask M and (e) com-posentional sparsity mask M prime

A Pre-processing

We use the following approaches to align an input pictureand to deal with the background

Face Detection First we use the dlib toolbox inOpenCV [23] to detect the face in a given image as well as68 key-points in the face In default settings if more thanone faces are detected we choose the largest one If no faceis detected we feed the original image into the global sketchsynthesis algorithm to produce a final output

Face Alignment Afterwards we geometrically align thefacial picture by affine transformation relying on centersof eyes The aligned image is automatically cropped to512times 512 pixels (11) or 512times 680 pixels (34) accordingto the size of the original picture Fig 2b shows an alignedexample

Face Parsing Then we use face parsing to representfacial components Specially we use MaskGAN [24] tosegment the aligned picture into 19 components includingbackground hair two eyebrows two eyes nose two lipsneck clothes etc Fig 2d illustrates a parsing mask M where facial components are distinguished by colors

Background Removal Finally based on the parsingmask we replace the background with white pixels (Fig2(c)) so no elements would be generated in the background

B Global Sketch Synthesis

We first develop a global sketch synthesis algorithm totransfer the facial picture x to a sketch Here we adoptthe algorithm propoed in [25] as our base model due toits inspiring performance in the arbitrary style transfer taskHowever this algorithm cannot generalize sparse and contin-uous brush-strokes for sketch synthesis We thus propose twonovel loss functions to boost the quality of sketch synthesisThe framework of our global sketch synthesis network is asshown in Fig 3 Details will be presented bellow

Discrim

intorLocal style transfer Network

Con

v

recL

GANLC

onv

Con

v

Con

v

Con

v

Con

v

contentLVG

GEncoder D

ecod

er

global style transfer Network

consistL

sparseL

VG

GEncoder

AdaINstyleL

Fig 3 Pipeline of the global sketch synthesis algorithm

Network Architecture The global synthesis algorithmtakes the pre-processed face photo x and a style image sas inputs and outputs a synthesised sketch Here we use areal sketch s (of arbitrary content) as the style image

We use the same network architecture as that used in [25]Specially we use the first few layers of a pre-trained VGG-19 [26] as an encoder f Both the content and style imagesare fed into f and converted to feature maps f(x) and f(s)Afterwards an AdaIN layer produces the target feature mapst by

t = σ(f(s))f(x)minus micro(x)σ(f(x))

+ micro(f(s)) (1)

where micro(middot) and σ(middot) denote the mean and variance offeatures Finally t is fed into a decoder g generating a sketchportrait T (x s)

T (x s) = g(t) (2)

g includes nine convolutional layers and three nearest up-sampling layers Reflection padding is applied to each layer

In the implementation f is fixed and g is randomlyinitialized We use a pre-trained VGG-19 to compute thefollowing four loss functions for training g

Content Loss The content loss encourages a synthesisedsketch T (x s) has the same content as the input facial picturex It is formulated as the Euclidean distance between thetarget features t and the features of the portrait T (x s)

Lcontent = f(T (x s))minus t2 (3)

Style Loss In AdaIN the style is represented by the meanand standard deviation of deep features The style loss iscorrespondingly formulated as

Lstyle =

Lsumi=1

micro(φi(T (x s)))minus micro(φi(s))2

+

Lsumi=1

σ(φi(T (x s)))minus σ(φi(s))2 (4)

where φi(middot) denotes the outputs over the i-th layer in VGG-19 In the experiments we use relu1 1 relu2 1 relu3 1and relu4 1 layers to compute the style loss

Self-consistency Loss We argue that the learned mappingfunction T (middot) should be self-consistent For a real sketchimage s the algorithm should be able to reconstruct it ieT (s s) = g(f(s)) asymp s The corresponding self-consistencyloss is formulated as

Lconsist = T (s s)minus s2 (5)

Experiments show that the self-consistency loss significantlyimproves the continuity of synthesised brush-strokes

Compositional Sparsity Finally we employ a componen-tial sparsity to constrain the network producing a sketch witha sparse set of brush-strokes High-quality portraits mainlypresent brush-strokes for primary components or geometricboundaries We therefore merely add a sparse constraint torelatively unnecessary regions such as the background hairskin and body The corresponding compostional-sparsityloss is formulated as

Lsparse = Mprime (1minus T (x s))1 (6)

where denotes the element-product operation M prime is abinary sparsity mask derived from the parsing mask M Specially M prime is of the same size as M and x In M prime pixelscorresponding to boundaries eyes eyebrows and lips areassigned 0 All the rest positions are assigned 1 in M primeand encouraged to present white color in T (x s) Sincethe predicted parsing mask might be imprecise we slightlyenlarge the black areas in M prime by image erosion The finalcomposional-sparsity mask is as illustrated in Fig 2(g)

Using a global sparsity ie M prime is full of 1 also leads to aparse set of elements in T (x s) However it reduces the ele-ments about individual characteristics as well Correspondingexperiments will be presented in Section VI-F

Full Objective Our full objective is

L = λ1Lcontent + λ2Lstyle + λ3Lconsist + λ4Lsparse (7)

where λ1 λ2 λ3 λ4 are weighting factors

C Local Sketch Synthesis

By carefully examining the globally synthesised portraitswe find eyebrows are not well generated occasionally Forlight-colored eyebrows the algorithm might produce noelements in this region For eyebrows with non-uniformcolors the algorithm may produce a line with bifurcations

To address this problem we additionally use a GAN tospecifically synthesis eyebrows Here we reproduce the localGAN in APDrawingGAN [6] The local sketch synthesisnetwork follows an U-Net architecture [18] and trained on asmall set of eyebrow photo-sketch pairs [6]

D Post-processing

Finally we implement a number of post-processing tech-niques to deal with the background and details

Image Binarization The output of both the global and lo-cal synthesis algorithms are gray-scale sketches We convertthem to binary images based on a threshold of 05

Fusing Global and Local Sketches Before fusion wethin the locally synthesised eyebrows and remove delicateeyelashes [6] by using image dilation Afterwards we re-place the eyebrows in the globally synthesised portrait withthese dilated eyebrows We here use the face parsing maskM to determine areas of eyebrows

Eyeball Renewal Occasionally our algorithm only pro-duce a circle to represent an eyeball Although this hardlyharm the final sketch drawn by the robot we propose amethod to renewal an eyeball Specially we add a black spotto an eye center if it is blank We determine the position ofeye center based on facial key-points detection

Style Fusion Given a facial picture the global synthesisalgorithm produces diverse sketches while using differentstyle images Such sketches might diverse in the numberwidth and continuity of brush-strokes It is therefore providea chance to fuse these sketches based on the facial parsingmask In our implementation we use a style image tosynthesis the primal sketch and use another style image tosynthesis the hair Afterwards we replace the hair region inthe former sketch by those in the latter one

V THE ROBOTIC SYSTEM

In this section we introduce the hardware componentsand the path planning module which implement the abovesynthesised portrait into a real sketch

A Hardware

We use a commercial robotic arm (3 DOFs) ie uArm[28] to perform portrait drawing We choose this roboticarm based on a trade-off between the performance andthe cost Using better robots might boost the performancebut will dramatically increase the cost for developing aportrait drawing robot The workspace size on paper is about160 mmtimes 160 mm In this work we have adopted writingbrushes for portraits drawing We integrate the brush with therobotic arm through plastic supports which are manufacturedby means of 3D printing

To improve the mobility of the robot we have imple-mented all the portrait synthesis and path planning algorithmson a remote server In this way the terminal of our drawingrobot is composed of a robotic arm a NVIDIA Jetson Nanoboard and a mobile device The light weight and smallvolume of our robotic terminal dramatically ease its applica-tions Besides by using a NVIDIA GeForce GTX 1060 GPUon the remote server we can transfer an input picture to asketch in one second averagely Such a high-speed synthesismeets the requirements of real-time applications

B Path Planning

Artists typically draw graphical lines along geometri-cal boundaries and render black marks for eyebrows andeyeballs To imitate this behaviour we use the followingmethods to convert a synthesised portrait to a trajectory

We first extract the skeletons of a portrait image The por-trait typically contain brush-strokes with non-uniform widthIt is time-consuming for a robot to elaborately renderingsuch details By extracting skeletons of brush-strokes wereduce the elements for a robot to reproduce We implementthis by means of mathematical morphology Afterwardswe search a sequence of points on skeletons along thegradient orientations estimated by a Canny edge detectorHere breadth-first search is used Finally to fill the blackmarks representing eyeballs and eyebrows we draw theclosed loop of edge from outside to inside iteratively

VI EXPERIMENTS

In this section we first introduce implementation detailsand then present a number of experiments to verify theperformance of our portrait drawing robot

A Implementation Details

Datasets First we need a large set of content imagesand a number of style images to train the global sketchsynthesis algorithm To this end we randomly select 1000low-resolution facial pictures from the CelebA dataset [27]and 1000 high-resolution facial pictures from the CelebA-HQ dataset [24] as the set of content images All thesepictures are resized into 512times512 pixels We also download20 real sketches of arbitrary content from the Web as styleimages We randomly choose 95 of facial pictures fortraining and use the rest for validation

Second to train our local sketch synthesis algorithm weextract 280 eyebrow photo-sketch pairs the dataset releasedby [6] Each sample is of 60 times 60 pixels We randomlychoose 210 samples for training and the rest 70 samplesfor validation

Finally we download about 200 images includingcelebrity faces and universal images of arbitrary contentfrom the Web for testing These facial pictures present facesin-the-wild which are diverse in poses expressions etc Weapply our AiSketcher to these images so as to evaluate its per-formance in practical applications The corresponding resultshave been released at httpsricelllgithubioAiSketcher

Training In the training stage we use the Adam optimizerwith a learning rate of 1eminus 4 and a batch size of 8 for boththe global and local sketch synthesis algorithms Besides tooptimize the global algorithm we have λ1 = λ2 = λ3 = 1and λ4 = 10 in Equ7 and run for 160000 iterationsTo optimize the local algorithm we alternate between onegradient descent step on D then one step on G We usethe Adam optimizer with β = 05 and run for different600 epochs We train our algorithms on a single NVIDIAGeForce GTX 1080Ti GPU It takes about 22 hours and 1hour to train the global and the local algorithms respectively

B Comparison with State-of-the-art

In this part we compare our portrait synthesis algorithmwith existing work about portrait drawing robots ie [10][14] and [1] and a state-of-the-art (SOTA) portrait synthesismethod ie APDrawingGAN [6] For this purpose we col-lect the pictures and synthesised portraits presented in [10][14] and [1] Afterwards we apply the officially releasedAPDrawingGAN and our learn sketch synthesis algorithmto these facial pictures respectively

Inpu

t Ph

otos

Orig

inal

re

sults

APD

raw

ingG

AN

Our

s

Examples in [11] Examples in [2] Examples in [4]

Fig 4 Comparison with existing portrait drawing methods Input imagesare colected from [10] [14] and [1] The second row shows the correspond-ing results in the original papers The third and bottom rows show sketchessynthesised by APDrawingGAN [6] and our method respectively

As shown in Fig 4 the portraits produced by [10] are oversimple and present sinuate lines Portraits produced by both[14] and [1] represent facial structures by stiff and unrealisticlines Besides both of them cannot properly synthesis thehair The portraits generated by APDrawingGAN capture thedistinctive appearance of a person but present low-qualityelements Besides APDrawingGAN cannot produce portraitswith a sparse set of continuous lines and marks

In contrast the portraits synthesised by our algorithmsuccessfully capture individual characteristics and feelingsby using a sparse set of realistic brush-strokes Moreoverour algorithm produces a primal sketch for the area of clothswhich further improve the vividness of facial portraits Notethat these facial pictures are low-resolution and blurringResults shown in Fig 4 demonstrate the remarkable capacityof our algorithm in generating high-quality portraits

C Qualitative Evaluation

We further apply our robotic system to a wide rangeof faces in-the-wild Such faces are diverse in illuminationcondition pose expression etc and may contain occlusionsFig 5 show some examples Here we pre-process photos inFig 5a -5d before feeding them into APDrawingGAN or ourAiSketcher While the photos in Fig 5e -5h are directly inputinto APDrawingGAN or our global synthesis algorithm forproducing portraits

Similarly APDrawingGAN doesnrsquot work properly forthese photos In contrast our AiSketcher considerably and

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 4: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

+

Lsumi=1

σ(φi(T (x s)))minus σ(φi(s))2 (4)

where φi(middot) denotes the outputs over the i-th layer in VGG-19 In the experiments we use relu1 1 relu2 1 relu3 1and relu4 1 layers to compute the style loss

Self-consistency Loss We argue that the learned mappingfunction T (middot) should be self-consistent For a real sketchimage s the algorithm should be able to reconstruct it ieT (s s) = g(f(s)) asymp s The corresponding self-consistencyloss is formulated as

Lconsist = T (s s)minus s2 (5)

Experiments show that the self-consistency loss significantlyimproves the continuity of synthesised brush-strokes

Compositional Sparsity Finally we employ a componen-tial sparsity to constrain the network producing a sketch witha sparse set of brush-strokes High-quality portraits mainlypresent brush-strokes for primary components or geometricboundaries We therefore merely add a sparse constraint torelatively unnecessary regions such as the background hairskin and body The corresponding compostional-sparsityloss is formulated as

Lsparse = Mprime (1minus T (x s))1 (6)

where denotes the element-product operation M prime is abinary sparsity mask derived from the parsing mask M Specially M prime is of the same size as M and x In M prime pixelscorresponding to boundaries eyes eyebrows and lips areassigned 0 All the rest positions are assigned 1 in M primeand encouraged to present white color in T (x s) Sincethe predicted parsing mask might be imprecise we slightlyenlarge the black areas in M prime by image erosion The finalcomposional-sparsity mask is as illustrated in Fig 2(g)

Using a global sparsity ie M prime is full of 1 also leads to aparse set of elements in T (x s) However it reduces the ele-ments about individual characteristics as well Correspondingexperiments will be presented in Section VI-F

Full Objective Our full objective is

L = λ1Lcontent + λ2Lstyle + λ3Lconsist + λ4Lsparse (7)

where λ1 λ2 λ3 λ4 are weighting factors

C Local Sketch Synthesis

By carefully examining the globally synthesised portraitswe find eyebrows are not well generated occasionally Forlight-colored eyebrows the algorithm might produce noelements in this region For eyebrows with non-uniformcolors the algorithm may produce a line with bifurcations

To address this problem we additionally use a GAN tospecifically synthesis eyebrows Here we reproduce the localGAN in APDrawingGAN [6] The local sketch synthesisnetwork follows an U-Net architecture [18] and trained on asmall set of eyebrow photo-sketch pairs [6]

D Post-processing

Finally we implement a number of post-processing tech-niques to deal with the background and details

Image Binarization The output of both the global and lo-cal synthesis algorithms are gray-scale sketches We convertthem to binary images based on a threshold of 05

Fusing Global and Local Sketches Before fusion wethin the locally synthesised eyebrows and remove delicateeyelashes [6] by using image dilation Afterwards we re-place the eyebrows in the globally synthesised portrait withthese dilated eyebrows We here use the face parsing maskM to determine areas of eyebrows

Eyeball Renewal Occasionally our algorithm only pro-duce a circle to represent an eyeball Although this hardlyharm the final sketch drawn by the robot we propose amethod to renewal an eyeball Specially we add a black spotto an eye center if it is blank We determine the position ofeye center based on facial key-points detection

Style Fusion Given a facial picture the global synthesisalgorithm produces diverse sketches while using differentstyle images Such sketches might diverse in the numberwidth and continuity of brush-strokes It is therefore providea chance to fuse these sketches based on the facial parsingmask In our implementation we use a style image tosynthesis the primal sketch and use another style image tosynthesis the hair Afterwards we replace the hair region inthe former sketch by those in the latter one

V THE ROBOTIC SYSTEM

In this section we introduce the hardware componentsand the path planning module which implement the abovesynthesised portrait into a real sketch

A Hardware

We use a commercial robotic arm (3 DOFs) ie uArm[28] to perform portrait drawing We choose this roboticarm based on a trade-off between the performance andthe cost Using better robots might boost the performancebut will dramatically increase the cost for developing aportrait drawing robot The workspace size on paper is about160 mmtimes 160 mm In this work we have adopted writingbrushes for portraits drawing We integrate the brush with therobotic arm through plastic supports which are manufacturedby means of 3D printing

To improve the mobility of the robot we have imple-mented all the portrait synthesis and path planning algorithmson a remote server In this way the terminal of our drawingrobot is composed of a robotic arm a NVIDIA Jetson Nanoboard and a mobile device The light weight and smallvolume of our robotic terminal dramatically ease its applica-tions Besides by using a NVIDIA GeForce GTX 1060 GPUon the remote server we can transfer an input picture to asketch in one second averagely Such a high-speed synthesismeets the requirements of real-time applications

B Path Planning

Artists typically draw graphical lines along geometri-cal boundaries and render black marks for eyebrows andeyeballs To imitate this behaviour we use the followingmethods to convert a synthesised portrait to a trajectory

We first extract the skeletons of a portrait image The por-trait typically contain brush-strokes with non-uniform widthIt is time-consuming for a robot to elaborately renderingsuch details By extracting skeletons of brush-strokes wereduce the elements for a robot to reproduce We implementthis by means of mathematical morphology Afterwardswe search a sequence of points on skeletons along thegradient orientations estimated by a Canny edge detectorHere breadth-first search is used Finally to fill the blackmarks representing eyeballs and eyebrows we draw theclosed loop of edge from outside to inside iteratively

VI EXPERIMENTS

In this section we first introduce implementation detailsand then present a number of experiments to verify theperformance of our portrait drawing robot

A Implementation Details

Datasets First we need a large set of content imagesand a number of style images to train the global sketchsynthesis algorithm To this end we randomly select 1000low-resolution facial pictures from the CelebA dataset [27]and 1000 high-resolution facial pictures from the CelebA-HQ dataset [24] as the set of content images All thesepictures are resized into 512times512 pixels We also download20 real sketches of arbitrary content from the Web as styleimages We randomly choose 95 of facial pictures fortraining and use the rest for validation

Second to train our local sketch synthesis algorithm weextract 280 eyebrow photo-sketch pairs the dataset releasedby [6] Each sample is of 60 times 60 pixels We randomlychoose 210 samples for training and the rest 70 samplesfor validation

Finally we download about 200 images includingcelebrity faces and universal images of arbitrary contentfrom the Web for testing These facial pictures present facesin-the-wild which are diverse in poses expressions etc Weapply our AiSketcher to these images so as to evaluate its per-formance in practical applications The corresponding resultshave been released at httpsricelllgithubioAiSketcher

Training In the training stage we use the Adam optimizerwith a learning rate of 1eminus 4 and a batch size of 8 for boththe global and local sketch synthesis algorithms Besides tooptimize the global algorithm we have λ1 = λ2 = λ3 = 1and λ4 = 10 in Equ7 and run for 160000 iterationsTo optimize the local algorithm we alternate between onegradient descent step on D then one step on G We usethe Adam optimizer with β = 05 and run for different600 epochs We train our algorithms on a single NVIDIAGeForce GTX 1080Ti GPU It takes about 22 hours and 1hour to train the global and the local algorithms respectively

B Comparison with State-of-the-art

In this part we compare our portrait synthesis algorithmwith existing work about portrait drawing robots ie [10][14] and [1] and a state-of-the-art (SOTA) portrait synthesismethod ie APDrawingGAN [6] For this purpose we col-lect the pictures and synthesised portraits presented in [10][14] and [1] Afterwards we apply the officially releasedAPDrawingGAN and our learn sketch synthesis algorithmto these facial pictures respectively

Inpu

t Ph

otos

Orig

inal

re

sults

APD

raw

ingG

AN

Our

s

Examples in [11] Examples in [2] Examples in [4]

Fig 4 Comparison with existing portrait drawing methods Input imagesare colected from [10] [14] and [1] The second row shows the correspond-ing results in the original papers The third and bottom rows show sketchessynthesised by APDrawingGAN [6] and our method respectively

As shown in Fig 4 the portraits produced by [10] are oversimple and present sinuate lines Portraits produced by both[14] and [1] represent facial structures by stiff and unrealisticlines Besides both of them cannot properly synthesis thehair The portraits generated by APDrawingGAN capture thedistinctive appearance of a person but present low-qualityelements Besides APDrawingGAN cannot produce portraitswith a sparse set of continuous lines and marks

In contrast the portraits synthesised by our algorithmsuccessfully capture individual characteristics and feelingsby using a sparse set of realistic brush-strokes Moreoverour algorithm produces a primal sketch for the area of clothswhich further improve the vividness of facial portraits Notethat these facial pictures are low-resolution and blurringResults shown in Fig 4 demonstrate the remarkable capacityof our algorithm in generating high-quality portraits

C Qualitative Evaluation

We further apply our robotic system to a wide rangeof faces in-the-wild Such faces are diverse in illuminationcondition pose expression etc and may contain occlusionsFig 5 show some examples Here we pre-process photos inFig 5a -5d before feeding them into APDrawingGAN or ourAiSketcher While the photos in Fig 5e -5h are directly inputinto APDrawingGAN or our global synthesis algorithm forproducing portraits

Similarly APDrawingGAN doesnrsquot work properly forthese photos In contrast our AiSketcher considerably and

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 5: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

B Path Planning

Artists typically draw graphical lines along geometri-cal boundaries and render black marks for eyebrows andeyeballs To imitate this behaviour we use the followingmethods to convert a synthesised portrait to a trajectory

We first extract the skeletons of a portrait image The por-trait typically contain brush-strokes with non-uniform widthIt is time-consuming for a robot to elaborately renderingsuch details By extracting skeletons of brush-strokes wereduce the elements for a robot to reproduce We implementthis by means of mathematical morphology Afterwardswe search a sequence of points on skeletons along thegradient orientations estimated by a Canny edge detectorHere breadth-first search is used Finally to fill the blackmarks representing eyeballs and eyebrows we draw theclosed loop of edge from outside to inside iteratively

VI EXPERIMENTS

In this section we first introduce implementation detailsand then present a number of experiments to verify theperformance of our portrait drawing robot

A Implementation Details

Datasets First we need a large set of content imagesand a number of style images to train the global sketchsynthesis algorithm To this end we randomly select 1000low-resolution facial pictures from the CelebA dataset [27]and 1000 high-resolution facial pictures from the CelebA-HQ dataset [24] as the set of content images All thesepictures are resized into 512times512 pixels We also download20 real sketches of arbitrary content from the Web as styleimages We randomly choose 95 of facial pictures fortraining and use the rest for validation

Second to train our local sketch synthesis algorithm weextract 280 eyebrow photo-sketch pairs the dataset releasedby [6] Each sample is of 60 times 60 pixels We randomlychoose 210 samples for training and the rest 70 samplesfor validation

Finally we download about 200 images includingcelebrity faces and universal images of arbitrary contentfrom the Web for testing These facial pictures present facesin-the-wild which are diverse in poses expressions etc Weapply our AiSketcher to these images so as to evaluate its per-formance in practical applications The corresponding resultshave been released at httpsricelllgithubioAiSketcher

Training In the training stage we use the Adam optimizerwith a learning rate of 1eminus 4 and a batch size of 8 for boththe global and local sketch synthesis algorithms Besides tooptimize the global algorithm we have λ1 = λ2 = λ3 = 1and λ4 = 10 in Equ7 and run for 160000 iterationsTo optimize the local algorithm we alternate between onegradient descent step on D then one step on G We usethe Adam optimizer with β = 05 and run for different600 epochs We train our algorithms on a single NVIDIAGeForce GTX 1080Ti GPU It takes about 22 hours and 1hour to train the global and the local algorithms respectively

B Comparison with State-of-the-art

In this part we compare our portrait synthesis algorithmwith existing work about portrait drawing robots ie [10][14] and [1] and a state-of-the-art (SOTA) portrait synthesismethod ie APDrawingGAN [6] For this purpose we col-lect the pictures and synthesised portraits presented in [10][14] and [1] Afterwards we apply the officially releasedAPDrawingGAN and our learn sketch synthesis algorithmto these facial pictures respectively

Inpu

t Ph

otos

Orig

inal

re

sults

APD

raw

ingG

AN

Our

s

Examples in [11] Examples in [2] Examples in [4]

Fig 4 Comparison with existing portrait drawing methods Input imagesare colected from [10] [14] and [1] The second row shows the correspond-ing results in the original papers The third and bottom rows show sketchessynthesised by APDrawingGAN [6] and our method respectively

As shown in Fig 4 the portraits produced by [10] are oversimple and present sinuate lines Portraits produced by both[14] and [1] represent facial structures by stiff and unrealisticlines Besides both of them cannot properly synthesis thehair The portraits generated by APDrawingGAN capture thedistinctive appearance of a person but present low-qualityelements Besides APDrawingGAN cannot produce portraitswith a sparse set of continuous lines and marks

In contrast the portraits synthesised by our algorithmsuccessfully capture individual characteristics and feelingsby using a sparse set of realistic brush-strokes Moreoverour algorithm produces a primal sketch for the area of clothswhich further improve the vividness of facial portraits Notethat these facial pictures are low-resolution and blurringResults shown in Fig 4 demonstrate the remarkable capacityof our algorithm in generating high-quality portraits

C Qualitative Evaluation

We further apply our robotic system to a wide rangeof faces in-the-wild Such faces are diverse in illuminationcondition pose expression etc and may contain occlusionsFig 5 show some examples Here we pre-process photos inFig 5a -5d before feeding them into APDrawingGAN or ourAiSketcher While the photos in Fig 5e -5h are directly inputinto APDrawingGAN or our global synthesis algorithm forproducing portraits

Similarly APDrawingGAN doesnrsquot work properly forthese photos In contrast our AiSketcher considerably and

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 6: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

(a) (b) (c) (d) (e) (f) (g) (h)

Fig 5 Sketchy portraits produced by our AiSketcher for nationalcelebrities From top to bottom are input images sketches synthesised byAPDrawingGAN [6] sketches synthesised and drawn by AiSketcher Allthe input images shown here are downloaded from the Web

consistently produce high-quality sketches for all these pho-tos Our algorithm successfully deal with complex back-grounds (Fig 5a and 5b) glasses (Fig 5b) extreme poses(Fig 5e 5f and 5h) and occlusions (Fig 5f) Notably givena half-length photo (Fig 5h) our algorithm is capable ofproducing a high-quality and abstract portrait

By comparing our synthesised sketches with the final ver-sions drawn by AiSketcher we conclude that the robotic armprecisely reproduce the synthesised sketches Although somedetails are slightly missing due to the limited capabilityof our 3-DOF robotic arm the final sketches successfullypresent the individual appearances

D Generalization Ability

We further apply AiSketcher to universal images such aspencil-drawing images cartoon images etc In this case wedonrsquot use pre-processing or the local synthesis algorithmbecause there might be no human face in a given image Asillustrated in Fig 6 AiSketcher still produce considerablyhigh-quality sketches A wide range of test shows that ourAiSketcher tends to produce a high-quality sketch for anuniversal image unless the image contains a large areaof delicate textures or is low-quality (eg low-resolutionblurred or noisy etc)

Fig 6 Synthesised sketches for universal images From top to bottom areinput images sketches synthesised and drawn by AiSketcher All the inputimages shown here are downloaded from the Web

E Time Consumption

In the testing stage given an input picture it takesabout 1s for AiSketcher to synthesis a sketch by using aserver with a NVIDIA GeForce GTX 1060 GPU After-wards it costs about 1-3 minutes for the robotic arm todraw it The total time-consuming is about 2 minutes inaverage Thus AiSketcher fairly satisfy the time-budget inpractical scenarios Recall the considerable performance ofAiSketcher in generating high-quality portraits we concludethat AiSketcher successfully balance vividness and time-consumption

F Ablation Study

Finally we present a series of ablation studies to verify theeffects of our proposed techniques For this purpose we testseveral model variants derived from our portrait synthesisalgorithm including (i) the based model AdaIN [25] (ii)AdaIN with the self-consistency loss AdaIN w Lconsist(ii) AdaIN with both Lconsist and global sparsity AdaIN wLglobalsparse (iv) AdaIN with both Lsketch and compositional

sparsity loss AdaIN w Lsparse and (v) our full model in-cluding pre-processing local synthesis and post-processing

(a) (b) (c) (d) (e) (f) (g)

Fig 7 Ablation study on our novel portrait synthesis algorithm From leftto right are (a) Input photo and sketches synthesised by (b) AdaIN [25](c) AdaIN w Lconsist (d) AdaIN w Lglobal

sparse (e) AdaIN w Lsparseand (f) our full model (g) parsing mask predicted by using MaskGAN [24]

Effects of Self-consistency Loss Fig 7a shows thatAdaIN typically produces black marks over the hair regionand snatchy brush-strokes In contrast AdaIN w Lconsist

produces smooth and continuous brush-strokes with fewblack marks The apparent improvement demonstrates ourmotivation of using the self-consistency loss

Effects of Compositional Sparsity Sketches in Fig 7bpresent too many brushes which requires a long drawingtime Both Fig 7c and 7d show that using a sparsityconstraint dramatically reduce the number of brush-strokesHowever using a global sparsity reduces the elements repre-senting individual characteristics (eg the eyes and eyebrowsin Fig7c) In contrast our compositional sparsity success-fully avoid this problem In other words the compositionalsparsity dramatically decrease the drawing time withoutapparently decreasing the quality of synthesised portraits

Effects of Local Synthesis Fig7d shows that the globallysynthesised eyebrows may present lines with bifurcationsWhile replacing the eyebrows by the locally synthesised onesresults in a better portrait as shown in Fig7e

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References
Page 7: Making Robots Draw A Vivid Portrait In Two Minutes · portraits in a short time. In this work, we present a drawing robot, which can automatically transfer a facial picture to a vivid

Based on the above observations we can safely drawthe conclusion that our proposed techniques significantlyimprove the quality of synthesised portraits

VII CONCLUSIONS

In this work we present a portrait drawing robot namedAiSketcher AiSketcher has shown fantastic performanceover a wide range of images Besides extensive experimentalresults demonstrate that AiSketcher achieves a balance be-tween the quality and the drawing time of sketch portraits Bycarefully examining the produced sketches by AiSketcher wefind there is still substantial room for improvement First oursketch synthesis algorithm may not produce primal portraitfor faces with shadows It typically produce undesired brush-strokes along boundaries of a shadow Illumination normal-ization is a promising approach for solving this problemSecond the robotic arm cannot precisely reproduce delicateelements in a synthesised sketch It is interesting to optimizeboth the path planning and the portrait synthesis algorithmsby taking into account the limitations of a robot arm Besidesvisual feedback [2] [29] is another promising solution Wewill explore these issues in the near future

REFERENCES

[1] Gao Qiaochu et al A Robot Portraits Pencil Sketching AlgorithmBased on Face Component and Texture Segmentation IEEE ICIT2019 48-53

[2] Luo RC Hong M-J Chung P-C Robot artist for colorful picturepainting with visual control system In 2016 IEEERSJ IROS 2016pp 29983003 (2016)

[3] Lorenzo S Stefano Seriani Alessandro G Paolo G WatercolourRobotic Painting a Novel Automatic System for Artistic RenderingJournal of Intelligent amp Robotic Systems (2019) 95 87188

[4] 2018 International Robotic Art Competition (RobotArt) httprobotartorgarchives2018artworks (2018)

[5] Gommel M Haitz M Zappe J Robotlab autoportrait projectHuman portraits drawn by a robot httpwwwrobotlabde

[6] Ran Yi Yong-Jin Liu Yu-Kun Lai Paul L Rosin APDrawingGANGenerating Artistic Portrait Drawings from Face Photos with Hierar-chical GANs IEEE CVPR 2019 pp 10743-10752

[7] Krizhevsky Alex Ilya Sutskever and Geoffrey E Hinton ImageNetClassification with Deep Convolutional Neural Networks NIPS 20121097-1105

[8] Jing Yongcheng et al Neural Style Transfer A Review IEEE TVCG2019 1-1

[9] Sylvain Calinon Julien Epiney and Aude Billard A humanoid robotdrawing human portraitsrdquo IEEERSJ IROS (2005) 161-166

[10] Lin Chyiyeu Liwen Chuang and Thi Thoa Mac Human portraitgeneration system for robot arm drawing IEEE AIM (2009) 1757-1762

[11] Mohammed Abdullah Lihui Wang and Robert X Gao IntegratedImage Processing and Path Planning for Robotic Sketching ProcediaCIRP (2013) 199-204

[12] Lau Meng Cheng et al A portrait drawing robot using a geometricgraph approach Furthest Neighbour Theta-graphs IEEE AIM (2012)75-79

[13] Tresset P Leymarie F F Portrait drawing by Paul the robot Computersamp Graphics 2013 37(5)348-363

[14] Xue T Liu Y Robot portrait rendering based on multi-features fusionmethod inspired by human painting IEEE ROBIO IEEE 2017

[15] A A Efros and T K Leung Texture synthesis by nonparametricsampling In ICCV 1999

[16] L A Gatys A S Ecker andM Bethge Image style transfer usingconvolutional neural networks In CVPR 2016

[17] I Goodfellow J Pouget-Abadie M Mirza B Xu DWarde-FarleyS Ozair A Courville and Y Bengio Generative adversarial nets InNIPS 2014

[18] P Isola J-Y Zhu T Zhou and A A Efros Image-to-image trans-lation with conditional adversarial networks In CVPR 2017

[19] Gui Jie et al A Review on Generative Adversarial NetworksAlgorithms Theory and Applications arXiv200106937 pp 1-28(2020)

[20] N Wang D Tao X Gao X Li and J Li A comprehensive surveyto face hallucination IJCV vol 106 no 1 pp 930 2014

[21] Chen Chaofeng Xiao Tan and Kwanyee K Wong Face SketchSynthesis with Style Transfer Using Pyramid Column Feature WACV2018 485-493

[22] J Yu X Xu F Gao S Shi M Wang D Tao and Q Huang TowardsRealistic Face Photo-Sketch Synthesis via Composition-Aided GANsIEEE Trans CYBERN DOI 101109TCYB20202972944 2020

[23] httpsopencvorg[24] Lee Cheng-Han Liu Ziwei Wu Lingyun Luo Ping MaskGAN

Towards Diverse and Interactive Facial Image Manipulation arXivpreprint arXiv190711922 2019

[25] Huang Xun and Serge Belongie Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization In ICCV 2017 1510-1519

[26] K Simonyan and A Zisserman Very deep convolutional networks forlarge-scale image recognition In ICLR 2015

[27] Z Liu P Luo X Wang X Tang Deep learning face attributes in thewild ICCV 2015 pp37303738

[28] httpswwwufactoryccenuarmswift[29] Berio D Calinon S Leymarie FF Learning dynamic graffiti

strokes with a compliant robot In 2016 IEEERSJ IROS IEEE pp39813986 (2016)

  • I Introduction
  • II Related Work
    • II-A Portrait Drawing Robots
    • II-B Deep Neural Style Transfer
    • II-C Generative Adversarial Networks
    • II-D Facial Sketch Synthesis
      • III Architecture of The System
      • IV Algorithms for Portrait Synthesis
        • IV-A Pre-processing
        • IV-B Global Sketch Synthesis
        • IV-C Local Sketch Synthesis
        • IV-D Post-processing
          • V The Robotic System
            • V-A Hardware
            • V-B Path Planning
              • VI Experiments
                • VI-A Implementation Details
                • VI-B Comparison with State-of-the-art
                • VI-C Qualitative Evaluation
                • VI-D Generalization Ability
                • VI-E Time Consumption
                • VI-F Ablation Study
                  • VII CONCLUSIONS
                  • References