88
Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a hand-held device Roa Villescas, M. Award date: 2013 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Eindhoven University of Technology

MASTER

3D face reconstruction using structured light on a hand-held device

Roa Villescas M

Award date2013

Link to publication

DisclaimerThis document contains a student thesis (bachelors or masters) as authored by a student at Eindhoven University of Technology Studenttheses are made available in the TUe repository upon obtaining the required degree The grade received is not published on the documentas presented in the repository The required complexity or quality of research of student theses may vary by program and the requiredminimum study period may vary in duration

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors andor other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights

bull Users may download and print one copy of any publication from the public portal for the purpose of private study or research bull You may not further distribute the material or use it for any profit-making activity or commercial gain

Eindhoven University of Technology

Master Graduation Project

3D Face Reconstruction usingStructured Light on a Hand-held Device

Author

Martin Roa Villescas

Supervisors

Dr Ir Frank van Heesch

Prof Dr Ir Gerard de Haan

A thesis submitted in fulfilment of the requirements

for the degree of Master of Embedded Systems

in the

Smart Sensors amp Analysis Research Group

Philips Research

August 2013

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Abstract

Department of Mathematics and Computer Science

Master of Embedded Systems

3D Face Reconstruction using Structured Light on a Hand-held Device

by Martin Roa Villescas

A 3D hand-held scanner using the structured lighting technique has been developed by

the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This

thesis presents an embedded implementation of such scanner A translation of the orig-

inal MATLAB implementation into C language yielded in a speedup of approximately

15 times running on a desktop computer However running the new implementation

on an embedded platform increased the time from 05 sec to more than 14 sec A wide

range of optimizations were proposed and applied to improve the performance of the

application A final execution time of 51 seconds was achieved Moreover a visual-

ization module was developed to display the reconstructed 3D models by means of the

projector contained in the embedded device

Acknowledgements

I owe a debt of gratitude to the many people who helped me during my years at TUe

First I would like to thank Frank van Heesch my supervisor at Philips an excellent

professional and even better person who showed me the way through this challenging

project while encouraging me in every step of the way He was always generous with his

time and steered me in the right direction whenever I felt I needed help He has deeply

influenced every aspect of my work

I would also like to express my sincerest gratitude to my professor Gerard de Haan the

person who was responsible for opening Philiprsquos doors to my life His achievements are a

constant source of motivation Gerard is a clear demonstration of how the collaboration

between industry and academy can produce unprecedented and magnificent results

My special thanks to all my fellow students at Philips Research who made these eight

months a wonderful time of my life Their input and advice contributed significantly

to the final result of my work In particular I would like to thank Koen de Laat for

helping me set up an automated database system to keep track of the profiling results

Furthermore I would like to thank Catalina Suarez my girlfriend for her support during

this year Your company has translated in the happiness I need to perform well in the

many aspects of my life

Finally I would like to thank my family for their permanent love and support It is hard

to find the right words to express the immense gratitude that I feel for those persons who

have given me everything so that I could be standing where I am now Mom and dad

my achievements are the result of the infinite love that you have given me throughout

my life and I will never stop feeling grateful for that

iii

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

1 Introduction 1

11 3D Mask Sizing project 3

12 Objectives 3

13 Report organization 4

2 Literature study 5

21 Surface reconstruction 5

211 Stereo analysis 6

212 Structured lighting 9

2121 Triangulation technique 10

2122 Pattern coding strategies 11

2123 3D human face reconstruction 12

22 Camera calibration 13

221 Definition 14

222 Popular techniques 14

3 3D face scanner application 17

31 Read binary file 18

32 Preprocessing 18

321 Parse XML file 18

322 Discard frames 19

323 Crop frames 19

324 Scale 19

33 Normalization 19

331 Normalization 20

332 Texture 2 21

333 Modulation 22

334 Texture 1 22

34 Global motion compensation 23

v

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 2: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Eindhoven University of Technology

Master Graduation Project

3D Face Reconstruction usingStructured Light on a Hand-held Device

Author

Martin Roa Villescas

Supervisors

Dr Ir Frank van Heesch

Prof Dr Ir Gerard de Haan

A thesis submitted in fulfilment of the requirements

for the degree of Master of Embedded Systems

in the

Smart Sensors amp Analysis Research Group

Philips Research

August 2013

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Abstract

Department of Mathematics and Computer Science

Master of Embedded Systems

3D Face Reconstruction using Structured Light on a Hand-held Device

by Martin Roa Villescas

A 3D hand-held scanner using the structured lighting technique has been developed by

the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This

thesis presents an embedded implementation of such scanner A translation of the orig-

inal MATLAB implementation into C language yielded in a speedup of approximately

15 times running on a desktop computer However running the new implementation

on an embedded platform increased the time from 05 sec to more than 14 sec A wide

range of optimizations were proposed and applied to improve the performance of the

application A final execution time of 51 seconds was achieved Moreover a visual-

ization module was developed to display the reconstructed 3D models by means of the

projector contained in the embedded device

Acknowledgements

I owe a debt of gratitude to the many people who helped me during my years at TUe

First I would like to thank Frank van Heesch my supervisor at Philips an excellent

professional and even better person who showed me the way through this challenging

project while encouraging me in every step of the way He was always generous with his

time and steered me in the right direction whenever I felt I needed help He has deeply

influenced every aspect of my work

I would also like to express my sincerest gratitude to my professor Gerard de Haan the

person who was responsible for opening Philiprsquos doors to my life His achievements are a

constant source of motivation Gerard is a clear demonstration of how the collaboration

between industry and academy can produce unprecedented and magnificent results

My special thanks to all my fellow students at Philips Research who made these eight

months a wonderful time of my life Their input and advice contributed significantly

to the final result of my work In particular I would like to thank Koen de Laat for

helping me set up an automated database system to keep track of the profiling results

Furthermore I would like to thank Catalina Suarez my girlfriend for her support during

this year Your company has translated in the happiness I need to perform well in the

many aspects of my life

Finally I would like to thank my family for their permanent love and support It is hard

to find the right words to express the immense gratitude that I feel for those persons who

have given me everything so that I could be standing where I am now Mom and dad

my achievements are the result of the infinite love that you have given me throughout

my life and I will never stop feeling grateful for that

iii

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

1 Introduction 1

11 3D Mask Sizing project 3

12 Objectives 3

13 Report organization 4

2 Literature study 5

21 Surface reconstruction 5

211 Stereo analysis 6

212 Structured lighting 9

2121 Triangulation technique 10

2122 Pattern coding strategies 11

2123 3D human face reconstruction 12

22 Camera calibration 13

221 Definition 14

222 Popular techniques 14

3 3D face scanner application 17

31 Read binary file 18

32 Preprocessing 18

321 Parse XML file 18

322 Discard frames 19

323 Crop frames 19

324 Scale 19

33 Normalization 19

331 Normalization 20

332 Texture 2 21

333 Modulation 22

334 Texture 1 22

34 Global motion compensation 23

v

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 3: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Abstract

Department of Mathematics and Computer Science

Master of Embedded Systems

3D Face Reconstruction using Structured Light on a Hand-held Device

by Martin Roa Villescas

A 3D hand-held scanner using the structured lighting technique has been developed by

the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This

thesis presents an embedded implementation of such scanner A translation of the orig-

inal MATLAB implementation into C language yielded in a speedup of approximately

15 times running on a desktop computer However running the new implementation

on an embedded platform increased the time from 05 sec to more than 14 sec A wide

range of optimizations were proposed and applied to improve the performance of the

application A final execution time of 51 seconds was achieved Moreover a visual-

ization module was developed to display the reconstructed 3D models by means of the

projector contained in the embedded device

Acknowledgements

I owe a debt of gratitude to the many people who helped me during my years at TUe

First I would like to thank Frank van Heesch my supervisor at Philips an excellent

professional and even better person who showed me the way through this challenging

project while encouraging me in every step of the way He was always generous with his

time and steered me in the right direction whenever I felt I needed help He has deeply

influenced every aspect of my work

I would also like to express my sincerest gratitude to my professor Gerard de Haan the

person who was responsible for opening Philiprsquos doors to my life His achievements are a

constant source of motivation Gerard is a clear demonstration of how the collaboration

between industry and academy can produce unprecedented and magnificent results

My special thanks to all my fellow students at Philips Research who made these eight

months a wonderful time of my life Their input and advice contributed significantly

to the final result of my work In particular I would like to thank Koen de Laat for

helping me set up an automated database system to keep track of the profiling results

Furthermore I would like to thank Catalina Suarez my girlfriend for her support during

this year Your company has translated in the happiness I need to perform well in the

many aspects of my life

Finally I would like to thank my family for their permanent love and support It is hard

to find the right words to express the immense gratitude that I feel for those persons who

have given me everything so that I could be standing where I am now Mom and dad

my achievements are the result of the infinite love that you have given me throughout

my life and I will never stop feeling grateful for that

iii

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

1 Introduction 1

11 3D Mask Sizing project 3

12 Objectives 3

13 Report organization 4

2 Literature study 5

21 Surface reconstruction 5

211 Stereo analysis 6

212 Structured lighting 9

2121 Triangulation technique 10

2122 Pattern coding strategies 11

2123 3D human face reconstruction 12

22 Camera calibration 13

221 Definition 14

222 Popular techniques 14

3 3D face scanner application 17

31 Read binary file 18

32 Preprocessing 18

321 Parse XML file 18

322 Discard frames 19

323 Crop frames 19

324 Scale 19

33 Normalization 19

331 Normalization 20

332 Texture 2 21

333 Modulation 22

334 Texture 1 22

34 Global motion compensation 23

v

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 4: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Acknowledgements

I owe a debt of gratitude to the many people who helped me during my years at TUe

First I would like to thank Frank van Heesch my supervisor at Philips an excellent

professional and even better person who showed me the way through this challenging

project while encouraging me in every step of the way He was always generous with his

time and steered me in the right direction whenever I felt I needed help He has deeply

influenced every aspect of my work

I would also like to express my sincerest gratitude to my professor Gerard de Haan the

person who was responsible for opening Philiprsquos doors to my life His achievements are a

constant source of motivation Gerard is a clear demonstration of how the collaboration

between industry and academy can produce unprecedented and magnificent results

My special thanks to all my fellow students at Philips Research who made these eight

months a wonderful time of my life Their input and advice contributed significantly

to the final result of my work In particular I would like to thank Koen de Laat for

helping me set up an automated database system to keep track of the profiling results

Furthermore I would like to thank Catalina Suarez my girlfriend for her support during

this year Your company has translated in the happiness I need to perform well in the

many aspects of my life

Finally I would like to thank my family for their permanent love and support It is hard

to find the right words to express the immense gratitude that I feel for those persons who

have given me everything so that I could be standing where I am now Mom and dad

my achievements are the result of the infinite love that you have given me throughout

my life and I will never stop feeling grateful for that

iii

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

1 Introduction 1

11 3D Mask Sizing project 3

12 Objectives 3

13 Report organization 4

2 Literature study 5

21 Surface reconstruction 5

211 Stereo analysis 6

212 Structured lighting 9

2121 Triangulation technique 10

2122 Pattern coding strategies 11

2123 3D human face reconstruction 12

22 Camera calibration 13

221 Definition 14

222 Popular techniques 14

3 3D face scanner application 17

31 Read binary file 18

32 Preprocessing 18

321 Parse XML file 18

322 Discard frames 19

323 Crop frames 19

324 Scale 19

33 Normalization 19

331 Normalization 20

332 Texture 2 21

333 Modulation 22

334 Texture 1 22

34 Global motion compensation 23

v

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 5: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Contents

Abstract ii

Acknowledgements iii

List of Figures ix

1 Introduction 1

11 3D Mask Sizing project 3

12 Objectives 3

13 Report organization 4

2 Literature study 5

21 Surface reconstruction 5

211 Stereo analysis 6

212 Structured lighting 9

2121 Triangulation technique 10

2122 Pattern coding strategies 11

2123 3D human face reconstruction 12

22 Camera calibration 13

221 Definition 14

222 Popular techniques 14

3 3D face scanner application 17

31 Read binary file 18

32 Preprocessing 18

321 Parse XML file 18

322 Discard frames 19

323 Crop frames 19

324 Scale 19

33 Normalization 19

331 Normalization 20

332 Texture 2 21

333 Modulation 22

334 Texture 1 22

34 Global motion compensation 23

v

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 6: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

vi Contents

35 Decoding 24

36 Tessellation 25

37 Calibration 26

371 Offline process 27

372 Online process 27

38 Vertex filtering 28

381 Filter vertices based on decoding constraints 28

382 Filter vertices outside the measurement range 29

383 Filter vertices based on a maximum edge length 29

39 Hole filling 29

310 Smoothing 30

4 Embedded system development 31

41 Development tools 31

411 Hardware 32

4111 Single-board computer survey 32

4112 BeagleBoard-xM features 34

412 Software 34

4121 Software libraries 35

4122 Software development tools 36

42 MATLAB to C code translation 37

421 Motivation for developing in C language 37

422 Translation approach 38

43 Visualization 39

5 Performance optimizations 43

51 Double to single-precision floating-point numbers 44

52 Tuned compiler flags 44

53 Modified memory layout 45

54 Reimplementation of Crsquos standard power function 45

55 Reduced memory accesses 47

56 GMC in y dimension only 49

57 Error in Delaunay triangulation 50

58 Modified line shifting in GMC stage 50

59 New tessellation algorithm 51

510 Modified decoding stage 52

511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53

512 NEON assembly optimization 1 54

513 NEON assembly optimization 2 57

6 Results 61

61 MATLAB to C code translation 61

62 Visualization 62

63 Performance optimizations 62

7 Conclusions 67

71 Future work 68

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 7: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Contents vii

Bibliography 71

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 8: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

List of Figures

11 A subset of the CPAP masks offered by Philips 2

12 A 3D hand-held scanner developed in Philips Research 4

21 Standard stereo geometry 7

22 Assumed model for triangulation as proposed in [4] 10

23 Examples of pattern coding strategies 12

24 A reference framework assumed in [25] 14

31 General flow diagram of the 3D face scanner application 17

32 Example of the 16 frames that are captured by the hand-held scanner 18

33 Flow diagram of the preprocessing stage 18

34 Flow diagram of the normalization stage 20

35 Example of the 18 frames produced in the normalization stage 21

36 Camera frame sequence in a coordinate system 22

37 Flow diagram for the calculation of the texture 1 image 22

38 Flow diagram for the global motion compensation process 23

39 Difference between pixel-based and edge-based decoding 24

310 Vertices before and after the tessellation process 25

311 The Delaunay tessellation with all the circumcircles and their centers [33] 26

312 The calibration chart 27

313 The 3D model before and after the calibration process 28

314 3D resulting models after various filtering steps 29

315 Forehead of the 3D model before and after applying the smoothing process 30

41 The BeagleBoard-xM offered by Texas instruments 35

42 Simplified diagram of the 3D face scanner application 39

43 UV coordinate system 40

44 Diagram of the visualization module 41

51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44

53 Execution time before and after tuning GCCrsquos compiler options 45

54 Modification of the memory layout of the camera frames 46

55 Execution time with a different memory layout 46

56 Execution time before and after reimplementing Crsquos standard power func-tion 47

57 Order of execution before and after the optimization 48

58 Difference in execution time before and after reordering the preprocessingstage 48

ix

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 9: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

x List of Figures

59 Flow diagram for the GMC process as implemented in the MATLAB code 49

510 Difference in execution time before and after modifying the GMC stage 49

511 Execution time of the application after fixing an error in the tessellationstage 50

512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51

513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52

514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53

515 Execution time of the application before and after optimizing the decodingstage 54

516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55

517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55

518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56

519 Execution flow after first NEON assembly optimization 58

520 Execution times of the application before and after applying the firstNEON assembly optimization 59

521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59

522 Execution times of the application before and after applying the secondNEON assembly optimization 59

523 Final execution flow after second NEON assembly optimization 60

61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62

62 Example of the visualization module developed 63

63 Performance evolution of the 3D face scannerrsquos C implementation 64

64 Execution times for each stage of the application 65

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 10: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Dedicated to my grandmother

xi

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 11: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

Chapter 1

Introduction

The potential of science and technology to improve every aspect of life seems to be

boundless or at least this is what the innovations of the previous centuries suggest

Among the many different interests that advocate the development of science and tech-

nology human healthcare has always been an important stimulant New technologies

are constantly being developed by leading companies all around the world to improve the

quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal

Philips Electronics which devotes special interest to the development and introduction

of meaningful innovations that improve peoplersquos lives

Within the wide range of products offered by Philips there is a specific group cate-

gorized under the name of sleep solutions that aims at improving the sleep quality of

people A well-known family of products contained within this category are the so called

CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily

in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing

or instances of very low breathing during sleep [1] According to a recent study con-

ducted by Philips in collaboration with the University of Twente 64 of the surveyed

population was found to suffer from this disorder [2] A total number of 4206 people

comprising women and men of different ages and levels of education took part in the

2-year study A similar survey was undertaken by the National Institutes of Health in

the United States of America [3] It reported that sleep apnea was prevalent in more

than 18 million Americans ie 662 of the countryrsquos population

While aiming to attend the large demand for CPAP masks Philips has designed and

introduced a wide variety of mask models that seek to fulfill the different needs and

constraints that arise due to several factors which include the large diversity of size

and shape of human faces inclination towards breathing through the mouth or nose

diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia

1

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 12: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a

2 Chapter 1 Introduction

(a) Amara (b) ComfortClassic (c) ComfortGel Blue

(d) ComfortLite 2 (e) FitLife (f) GoLife

(g) ProfileLite Gel (h) Simplicity (i) ComfortGel

Figure 11 A subset of the CPAP masks offered by Philips

amongst others A subset of these models is shown in Figure 11 It is important to

mention that a poor selection of a CPAP mask might cause undesirable side effects to the

patient such as marks or even pressure ulcers Consequently the physical dimensions

of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP

mask

Unfortunately the current practices used to assess the adequacy of CPAP masks based

on facial dimensions are quite error prone They rely on trial-and-error procedures in

which the patient tries on different mask models and selects the one he thinks is the

most comfortable In order to alleviate this problem Philips Research launched the

3D Mask Sizing project which aims to develop an automated embedded system capable

Chapter 1 Introduction 3

of assisting sleep technicians in prescribing the most appropriate CPAP mask for each

patient

11 3D Mask Sizing project

The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-

logical means that can assist sleep technicians in the selection of a proper CPAP mask

model for each patient A series of algorithms methods and hardware prototypes are the

result of several years of research carried out by the Smart Sensing amp Analysis research

group in Philips Research Eindhoven The resulting automated mask advising system

comprises four main parts

1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry

2 The extraction of facial landmarks from the reconstructed model by means of

computer vision algorithms

3 The actual fit quality assessment by virtually fitting a series of 3D mask models

to the reconstructed face

4 The creation of a custom cushion that optimizes for uniform pressure along the

cushion contour

The focus of this thesis project is based on the first step

As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-

hoven a first prototype of a 3D hand-held scanner using the structured lighting technique

was already developed and is the base for the present project Figure 12a shows the

hardware setup of such device In short this scanner is capable of capturing a picture

sequence of a patientrsquos face while illuminating it with specific structured light patterns

Such picture sequence is processed by means of a series of algorithms in order to re-

construct a 3D model of the face An example of a resulting 3D model is presented in

Figure 12b The reconstruction process and all other calculations are currently being

performed offline and are mostly implemented in MATLAB

12 Objectives

The main objective of this thesis project is to extend the functionality of the mentioned

scanner such that the 3D reconstruction is computed locally on the embedded platform

This implies transforming the already developed methods and algorithms in such a

4 Chapter 1 Introduction

(a) Hardware (b) 3D model example

Figure 12 A 3D hand-held scanner developed in Philips Research

way that extra-functional requirements are taken into account These extra-functional

requirements involve an optimal use of the available computational resources Highest

priority should be given to the execution time of the application Specifically the 3D

reconstruction should be running on the embedded device in less than 5 seconds on

average Because the embedded processor contained in the final product will be similar

to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor

in particular by making proper use of the specific features it provides Moreover the

visualization of the reconstructed face model should be made possible by means of the

embedded projector contained in the device

13 Report organization

This report is organized as follows Chapter 2 presents the basic principles that underlay

different technologies for surface reconstruction placing special emphasis on structured

lighting techniques In Chapter 3 an overview of the 3D face scanner application is

provided which functions as the starting point for the current project Chapter 4

details the most relevant aspects that pertain to the implementation of the 3D face

scanner application on an embedded device In Chapter 5 a series of optimizations

used to reduce the execution time of the application are described Chapter 6 highlights

the most important results of the development process namely the MATLAB to C

translation the visualization module and the set of optimizations Finally Chapter 7

concludes the thesis while delineating paths for further improvements of the presented

work

(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()

Chapter 2

Literature study

This chapter presents a selective analysis of the state-of-the-art in the field of surface

reconstruction placing special emphasis on structured lighting techniques A brief

overview of the three main underlying technologies used for depth estimation is pre-

sented first This is followed by an example of stereo analysis which serves as the basis

for the more specific structured lighting techniques Moreover this example helps to

illustrate why stereo analysis is considered less preferable for 3D face reconstruction

applications when compared with the structured lighting techniques Special emphasis

is placed on the scientific principles underlying structured lighting techniques Further-

more a classification of the different types of pattern coding strategies available in the

literature is given along with an analysis of their suitability for our application Fi-

nally the chapter concludes with a brief discussion of camera calibration and its most

representative techniques

21 Surface reconstruction

Surface reconstruction has a wide range of practical applications such as computer mod-

eling of 3D objects (such as those found in areas like architecture mechanical engi-

neering or surgery) distance measurements for vehicle control surface inspections for

quality control approximate or exact estimates of the location of 3D objects for auto-

mated assembly and fast location of obstacles for efficient navigation [4]

Technologies for surface reconstruction include contact and non-contact techniques the

latter being our principal interest Non-contact techniques may be further categorized

as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-

niques use time-of-flight measurements to determine the distance to an object ie they

5

6 Chapter 2 Literature study

are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect

from an objectrsquos surface through a given medium Reflecto-metric techniques process

one or more images of the object to determine its surface orientation and consequently

its shape Finally stereo-metric techniques determine the location of the objectrsquos surface

by triangulating each point with its corresponding projections in two or more images

Echo-metric techniques suffer from a number of drawbacks Systems employing such

techniques are heavily affected by environmental parameters such as temperature and

humidity [6] These parameters affect the velocity at which waves travels through a

given medium thus introducing errors in depth measurement On the other hand

both reflecto-metric and stereo-metric techniques are less affected by environmental

parameters However reflecto-metric techniques entail a major difficulty ie they

require an estimation of the model of the environment In the remaining of this section

we will limit the discussion to the stereo-metric category and focus on the structured

lighting techniques

211 Stereo analysis

Considering that surface reconstruction by means of structured lighting can be regarded

as an extension of the more general stereo-vision technique an introductory example of

stereo analysis is presented in this section This example intends to show why the use

of structured lighting becomes essential for our application This example is presented

in [4]

Surface reconstruction can be achieved by means of the visual disparity that results

when an object is observed from different camera viewpoints In its simplest form two

cameras can be used for this purpose Triangulation between a point in the object and

its respective projection in each of the camera projection planes can be used to calculate

the depth at which this point lies from a certain reference Note however that in order

to calculate the triangulation more parameters are required These parameters refer for

example to the distance at which the cameras are located from one another (extrinsic

parameter) or to the focal length of each of the cameras (intrinsic parameter)

Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this

model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal

point of the left camera The focal point of the right camera lies at a distance b along

the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed

to have the same focal length f As a consequence the images of both cameras are

located in the same image plane The Z-axis coincides with the optical axis of the

left camera Moreover the optical axes of both cameras are parallel to each other and

Chapter 2 Literature study 7

oriented towards the scene objects Also note that because the x-axes of both images

are identically oriented rows with same row-number in the two different images lie on

the same straight line

optical axis of right camera

left image right image(XYZ)

row y row y

base distance b

optical axis of left camera

leftx rightx

Figure 21 Standard stereo geometry

In this model a scene point P = (XY Z) is projected onto two corresponding image

points

pleft = (xleft yleft) and pright = (xright yright)

in the left and right images respectively assuming that the scene point is visible from

both camera viewpoints The disparity with respect to pleft is a vector given by

∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)

between two corresponding image points

In the standard stereo geometry pinhole camera models are used to represent the con-

sidered cameras The basic idea of a pinhole camera is that it projects scene points P

onto image points p according to a central projection given by

p = (x y) =

(f middotXZ

f middot YZ

)(22)

assuming that Z gt f

According to the ideal assumptions considered in the standard stereo geometry of the

two cameras it holds that y = yleft = yright Therefore for the left camera the cen-

tral projection equation is given directly by Equation 22 considering that the pinhole

camera model assumes that the Z-axis is identified to be the optical axis of the camera

Furthermore given the displacement of the right camera by b along the X axis the

8 Chapter 2 Literature study

central projection equation is given by

(xright y) =

(f middot (X minus b)

Zf middot YZ

)

Rather than calculating a disparity vector given by Equation 21 for all corresponding

pairs of points in the different images the scalar disparity proves to be sufficient under

the assumptions made in the standard stereo geometry The scalar disparity of two

corresponding points in each one of the images with respect to pleft is given by

∆ssg(xleft yleft) =radic

(xleft minus xright)2 + (yleft minus yright)2

However because rows with same row numbers in the two images have the same y value

the scalar disparity of a pair of corresponding points reduces to

∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)

Note that it is valid to remove the absolute value operator because of the chosen arrange-

ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all

corresponding points in the two images For those points that could not be associated

with a correspondent point in the other image (for example because of occlusion) the

value ldquoundefinedrdquo is recorded

Finally in order to come up with the equations that determine the 3D location of each

point in the scene note that from the two central projection equations of the two cameras

it follows that

Z =f middotXxleft

=f middot (X minus b)xright

and therefore

X =b middot xleft

xleft minus xright

Using the previous equation it follows that

Z =b middot f

xleft minus xright

By substituting this result into the projection equation for y it follows that

Y =b middot y

xleft minus xright

The last three equations allow the reconstruction of the coordinates of the projected

points P within the three-dimensional XYZ-space assuming that the parameters f and

Chapter 2 Literature study 9

b are known and that the disparity map ∆(x y) was measured for each pair of corre-

sponding points in the two images Note that a variety of methods exists to calibrate

different types of camera configuration systems ie to determine their intrinsic and ex-

trinsic parameters More on these calibration procedures is further discussed in Section

22

The process of determining corresponding point pairs is known as the correspondence

problem A wide variety of techniques are used to solve the correspondence problem in

stereo image analysis Such techniques generally involve the extraction and matching

of features between two or more images These features are typically corners or edges

contained within the images Although these techniques are found to be appropriate for

a certain number of applications it turns out that they present a number of drawbacks

that make their applicability unfeasible for many others The main drawbacks are (i)

feature extraction and matching is generally computationally expensive (ii) features

might not be available depending on the nature of the environment or the placement

of the cameras and (iii) low lighting conditions generally increase the complexity of the

matching procedure thus making the system more error prone Such problems in solving

the correspondence problem can generally be overcome by resorting to a different but

similar type of techniques known by the name of structured lighting techniques While

structured lighting techniques involve a complete different methodology on how to solve

the correspondence problem they share large part of the theory presented in this section

regarding the depth reconstruction process

212 Structured lighting

Structured lighting methods can be thought of as a modification of the previously de-

scribed stereo analysis approach where one of the cameras is replaced by a light source

which projects a light pattern actively into the scene The location of an object in space

can then be determined by analyzing the deformation of the projected light pattern

The idea behind this modification is to simplify the complexity of the correspondence

analysis by actively manipulating the scene

It is important to note that stereoscopic based systems do not assume complex require-

ments for image acquisition since they mostly rely on theoretical mathematical and

algorithmic analyses to solve the reconstruction problem On the other hand the idea

behind structured lighting methods is to shift this complexity to another level such as

the engineering prerequisites of the overall system [4]

A wide variety of light patterns have been proposed by the research community [5] [7]ndash

[17] Their aim is to reduce the large number of images that would have to be captured

10 Chapter 2 Literature study

when using the most basic of all approaches ie a light spot In Section 2122 a

classification of the encoded patterns available is presented Nevertheless the light spot

projection technique serves as a solid starting point to introduce the main principle

underlying the depth recovery of most other encoded light patterns the triangulation

technique

2121 Triangulation technique

Triangulation refers to the process of determining the location of a point by measuring

angles formed from it to points at either end of a fixed baseline Various approaches

have been proposed for accomplishing this task An early analysis was described by Hall

et al [18] in 1982 Klette also presented his own analysis in [4] In the following an

overview of Klettersquos triangulation approach is explained

Figure 22 shows the simplified model that Klette assumes in his analysis Note that the

object

P

base distance bcamera light source

Z

XL

β

γ

α

h

O

d

Figure 22 Assumed model for triangulation as proposed in [4]

system can be thought of as a 2D object scene ie it has no vertical dimension As a

consequence the object light source and camera all lie in the same plane The angles

α and β are given by the calibration As in the previous example the base distance b

is assumed to be known and the origin of the coordinate system O coincides with the

projection center of the camera

Chapter 2 Literature study 11

The goal is to calculate the distance d between the origin O and the object point

P = (X0 Z0) This can be done using the law of sines as follows

d

sin(α)=

b

sin(γ)

From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that

d

sin(α)=

b

sin(π minus γ)=

b

sin(α+ β)

Therefore distance d is given by

d =b middot sin(α)

sin(α+ β)

which holds for any point P lying on the surface of the object

2122 Pattern coding strategies

As stated earlier there is a wide variety of pattern coding strategies available in the lit-

erature that aim to fulfill all requirements found in different scenarios and applications

In coded structure light systems every coded pixel in the pattern has its own codeword

that allows direct mapping ie every codeword is mapped to the corresponding coordi-

nates of a given pixel or group of pixels in the pattern A codeword can be represented

using grey levels colors or even geometrical characteristics The following classification

of pattern coding strategies was proposed by Salvi et al in [19]

bull Time-multiplexing This is one of the most commonly used strategies The

idea is to project a set of patterns onto the scene one after the other The

sequence of illuminated values determines the codeword for each pixel The main

advantage of this kind of pattern is that it can achieve high spatial resolution in

the measurements However its accuracy is highly sensible to movement of either

the structured light system or objects in the scene during the time period when the

acquisition process takes place Previous research in this area includes the work of

[5] [7] [8] An example of this coding strategy is the binary coded pattern shown

in Figure 23a

bull Spatial Neighborhood In this strategy the codeword that is assigned to a given

pixel depends on its neighborhood Codification is done on the basis of intensity

[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with

time-multiplexing strategies spatial neighborhood strategies allow for all coding

information to be condensed into a single projection pattern making them highly

12 Chapter 2 Literature study

suitable for applications that involve timing constraints such as autonomous nav-

igation The compromise however is deterioration in spatial resolution Figure

23b is an example of this strategy proposed by Griffin et al [14]

bull Direct coding In direct coding strategies every pixel in the pattern is labeled

by the information it represents In other words the entire codeword for a given

point is contained in a unique pixel as explained in [19] Basically there are two

ways to achieve this either by using a large range of color values [15] [16] or

by introducing periodicity [17] Although in theory this group of strategies can

be used to reconstruct objects with high resolution a major problem occurs in

practice the colors imaged by camera(s) of the system do not only depend on the

projected colors but also on the intrinsic colors of the measuring surface and light

source The consequence is that reference images become necessary Figure 23c

shows an example of a direct coding strategy proposed in [16]

(a) Time-multiplexing

In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al

69 GriffinmdashNarasimhanmdashfrac12ee

Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution

If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21

Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm

f0iVhm

i(50)

and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length

fij1(( f

i~1jVvm

j)mod b) (51)

For example if a basis equal to 3 is supposed thenits largest vectors are

Vhm(33132131123122121113323222333)

Vvm(3121132233)

Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be

used

Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been

associated

So the obtained matrix is

3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333

After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22

The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained

In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23

The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x

p1 y

p1) the projector position point (x

p2 y

p2) from

which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern

Recent progress in coded structured light 977

(b) Spatial Neighbor-hood (c) Direct coding

Figure 23 Examples of pattern coding strategies

2123 3D human face reconstruction

Given the importance of face reconstruction in a wide range of fields such as security

forensics or even entertainment it is no surprise that special focus has been devoted

to this area by the research community over the last decades A comparative study

of three different 3D face reconstruction approaches is presented in [20] Here the

most representative techniques of three different domains are tested These domains are

binocular stereo structured lighting and photometric stereo The experimental results

show that active reconstruction techniques perform better than purely passive ones for

this application

The majority of analysis on vision based reconstruction has focused on general perfor-

mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-

less some effort has been made on evaluating structured lighting techniques with special

focus on human face reconstruction In [21] a comparison is presented between three

Chapter 2 Literature study 13

structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to

assess 3D reconstruction for human faces by using mono and stereo systems The results

show that the Gray Code shift coding performs best given the high number of emitted

patterns it uses A further study on this topic was performed by the same author in

[22] Again it was found that time-multiplexing techniques such as binary encoding

using Gray Code provide the highest accuracy With a rather different objective than

that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their

effort on presenting a framework that captures 3D models of faces in high resolutions

with low computational load Here the system uses a single colored stripe pattern for

the reconstruction purpose plus a picture of the face illuminated with regular white light

that is used as texture

Particular aspects of 3D human face reconstruction such as proximity size and texture

involved make structured lighting a suitable approach On the contrary other recon-

struction techniques might be less suitable when dealing with these particular aspects

For example stereoscopic approaches fail to provide positive results when the textures

involved do not contain features that can be easily extracted and matched by means of

algorithms as in the case of the human face On the other hand the concepts behind

structured lighting make it very convenient to reconstruct these kind of surfaces given

the proximity involved and the size limits of the object in question (appropriate for

projecting encoded patterns)

With regard to the suitability of the different pattern coding strategies for our application

(3D human face reconstruction by means of a hand-held scanner) there are several

factors to consider Spatial neighborhood strategies do not offer high spatial resolution

which is needed by the algorithms that assess the fit quality of the various mask models

Direct coding strategies suffer from practical problems that affect their robustness to

different scenarios This centers the attention on the time-multiplexing techniques which

are known to provide high spatial resolution The problem with such techniques is

that they are highly sensible to movement which is likely to be present on a hand-

held device Fortunately there are several approaches as to how such problem can be

solved Consequently it is a time-multiplexing technique which is being employed in

our application

22 Camera calibration

Camera calibration is a crucial ingredient in the process of metric scene measurement

This section presents a review of some of the most popular techniques with special focus

on those that are regarded as adequate for our application

14 Chapter 2 Literature study

221 Definition

Camera calibration is the process of determining a mathematical approximation of the

physical and optical behavior of an imaging system by using a set of parameters These

parameters can be estimated by means of direct or iterative methods and they are divided

in two groups On the one hand intrinsic parameters determine how light is projected

through the lens onto the image plane of the sensor The focal length projection center

and lens distortion are all examples of intrinsic parameters On the other hand extrinsic

parameters measure the position and orientation of the camera with respect to a world

coordinate system as defined in [24] To better illustrate these ideas consider Figure

24 which corresponds to the optical system for the structured pattern projection and

triangulation considered in [25] The focal length fc and the projection center Oc are

examples of intrinsic parameters of the camera while the distance D between the camera

and the projector corresponds to an explicit parameter

Object

A

h

BC

H

D

ImagePlaneCamera

Reference Plane

Image Plane

Projector

f p

pO

cO

co

r

fχχ

Figure 24 A reference framework assumed in [25]

222 Popular techniques

In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration

that uses a 3times4 transformation matrix which maps 3D object points to their respective

2D image projections Here the model of the camera does not consider any lens distor-

tion For a detailed description of this method refer to [18] Some years later in 1986

Faugeras improved Hallrsquos work by proposing a technique that was based on extracting

the physical parameters of the camera from the transformation technique proposed in

[18] The description of this technique is given in [26] and [27] A non-linear explicit

camera calibration that included radial lens distortion was proposed by Salvi in his PhD

Chapter 2 Literature study 15

thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-

ear method However a method that would become much more popular and that is still

widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step

technique that models only radial lens distortion Also worth mentioning is the model

proposed by Weng [30] in 1992 which includes three different types of lens distortion

The calibration mechanism that is currently being used in our application is based on

the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although

this mechanism focuses on stereo camera calibration it was generalized for a system

with one camera and one projector It involves imaging a controlled scene from different

positions and orientations The controlled scene consists of a rigid calibration chart with

several markers The geometric and photometric properties of such markers are known

precisely so that they can be detected After corresponding markers in the different

images are found an algorithm searches the optimal set of camera parameters for which

triangulation of all corresponding marker-point pairs gives an accurate reconstruction of

the calibration chart This calibration mechanism is discussed further in Section 37

Chapter 3

3D face scanner application

This chapter provides a general overview of the 3D face scanner application developed

by the Smart Sensing amp Analysis research group and provided as a starting point for the

current project Figure 31 presents the main steps involved in the 3D reconstruction

process

Read binary file 31

Preprocessing 32

Normalization 33

Global motion compensation

36

Decoding 35

Tessellation 34

Calibration 37

Vertex filtering 38

Hole filling 39

bullBinary

bullXML Start

3D Model End

Figure 31 General flow diagram of the 3D face scanner application

The current scanner uses a total of 16 binary coded patterns that are sequentially pro-

jected onto the scene For each projection the scene is captured by means of the

embedded camera hence producing 16 different grayscale frames (Figure 32) that are

fed to the application in the form of a binary file This falls in line with the discussion

presented in Section 2123 of the literature study of why time-multiplexing strategies

result more suitable than spatial neighborhood or direct coding strategies for face recon-

struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is

described

17

18 Chapter 3 3D face scanner application

Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame

sequence is the input for the 3D face scanner application

31 Read binary file

The first step of the application is to read the binary file that contains the required

information for the 3D reconstruction The binary file is composed of two parts the

header and the actual data The header contains metadata of the acquired frames such

as the number of frames and the resolution of each one The second part contains the

actual data of the captured frames Figure 32 shows an example of such frame sequence

which from now on will be referred to as camera frames

32 Preprocessing

The preprocessing stage comprises the four steps shown in figure 33 Each of these steps

is described in the following subsections

Preprocessing

Parse XML file

Discard frames

Crop frames Scale

bullConvert to float

bullRange from 0-1

Figure 33 Flow diagram of the preprocessing stage

321 Parse XML file

In this stage the application first reads an XML file that is included for every scan

This file contains relevant information for the structured light reconstruction This

Chapter 3 3D face scanner application 19

information includes (i) the type of structured light patterns that were projected when

acquiring the data (ii) the number of frames captured while structured light patterns

were being projected (iii) the image resolution of each frame to be considered and (iv)

the calibration data

322 Discard frames

Based on the number of frames value read from the XML file the application discards

extra frames that do not contain relevant information for the structured light approach

but that are provided as part of the input

323 Crop frames

The original resolution of each camera frame (480times 768) is modified in order to obtain

a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border

of the images Note that this operation does not imply a loss of information in this

application in particular This is because pixels near the frame borders do not contain

facial information and therefore can be safely removed

324 Scale

Each pixel of the camera frame sequence (as provided by the embedded camera) is

represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage

the data type is transformed from unsigned integer to floating point while dividing each

pixel value by 255 The new set of values range between 0 and 1

33 Normalization

Even though this section is entitled Normalization a few more tasks are being performed

in this stage of the application as shown by the blue rectangles in Figure 34 Here wide

arrows represent flow of data whereas dashed lines represent the order of execution The

numbers inside the small data arrows pointing towards the different tasks represent the

number of frames used as input by each task The dashed line rectangle that encloses

the normalization and texture 2 tasks represents that there is not a clear sequential

execution between these two but rather that these are executed in an alternating fashion

This type of diagram will result particularly useful in Chapter 5 in order to explain the

20 Chapter 3 3D face scanner application

Normalization

Texture 2

Modulation

16 Camera Frames

In

8 frames Out

Texture 1

8 frames Out

1 frame Out

1 frame Out

Execution flow

Figure 34 Flow diagram of the normalization stage

modifications that were made to the application to improve its performance An example

of the different frames that are produced in this stage are visualized in Figure 35 A

brief description of each of the tasks involved in this stage follows

331 Normalization

The purpose of this stage is to extract the reflectivity component (texture information)

from the camera frames while aiming at enhancing the deformed illumination patterns

in the resulting frame sequence Figure 35a illustrates the result of this process The

deformed patterns are essential for the 3D reconstruction process

In order to understand how this process takes place we need to look back at Figure

32 Here it is possible to observe that the projected patterns in the top row frames are

equal to their corresponding frame in the bottom row with the only difference being

that the values of the projected pattern are inverted For each corresponding pair a

new image frame is generated according to the following equation

Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)

Fcamera(x y a) + Fcamera(x y b)

where a and b correspond to aligned top and bottom frames in Figure 32 respectively

An example of the resulting frame sequence is shown in Figure 35a

Chapter 3 3D face scanner application 21

(a) Normalized frame sequence

(b) Texture 2 frame sequence

(c) Modulation frame (d) Texture 1 frame

Figure 35 Example of the 18 frames produced in the normalization stage

332 Texture 2

The calculation of the texture 2 frame sequence follows the same procedure as the one

used to calculate the normalized frame sequence In fact the output of this process is an

intermediate step in the calculation of the normalized frames being this the reason why

the two processes are said to be performed in an alternating fashion The mathematical

equation that describes the calculation of the texture 2 frame sequence is

Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)

The resulting frame sequence (Figure 35b) is used later in the global motion compen-

sation stage

22 Chapter 3 3D face scanner application

333 Modulation

The purpose of this stage is to find the range of measured values for each (x y) pixel of

the camera frame sequence along the time dimension This is done in two steps First

two frames are generated by finding the maximum and minimum values along the time

(t) dimension (Figure 36) for every (x y) value in a frame

Camera Frame

Sequence x

y t

Figure 36 Camera frame sequence in a coordinate system

Second a modulation frame is produced by finding the difference between the previously

generated frames ie

Fmod(x y) = Fmax(x y)minus Fmin(x y)

Such modulation frame (Figure 35c) is required later during the decoding stage

334 Texture 1

Finally the last task in the Normalization stage corresponds to the generation of the

texture image that will be mapped onto the final 3D model In contrast to the previous

three tasks this subprocess does not take the complete set of 16 camera frames as input

but only the 2 with finest projection patterns Figure 37 shows the four processing

steps that are applied to the input in order to generate a texture image such as the one

presented in Figure 35d

Texture 1

Average frames

Gamma correction

5x5 mean filter

Histogram stretch

Figure 37 Flow diagram for the calculation of the texture 1 image

Chapter 3 3D face scanner application 23

34 Global motion compensation

The major drawback of time-multiplexing strategies is its high sensitivity to movement

In fact if no measures are taken to correct the slight amount of movement of the scanner

or of the objects in the scene during the acquisition process the complete reconstruction

process fails Although the global motion compensation stage is only a minor part of

the mechanism that makes the entire application robust to motion it is not negligible

in the final result

Global motion compensation is an extensive field of research for which many different

approaches and methods have been contributed The approach used in this application

is amongst the simplest in level of complexity Nevertheless it suffices the needs of the

current application

Figure 38 presents an overview of the algorithm used to achieve the global motion

compensation This process takes as input the normalized frame sequence introduced in

the previous section As noted at the bottom of the figure these steps are repeated for

every pair of consecutive frames As a first step the pixels in each column are added for

both frames This results in two vectors that hold the cumulative sums of each frame

The second step is to determine by how many pixels the second image is displaced with

respect to the first one In order to achieve this the sum of absolute differences between

elements of the two column-sum vectors is calculated while slowly displacing the two

vectors with respect to each other The result is a new vector containing the SAD value

for each displacement Subsequently the index of the smallest element in the SAD

values vector is searched in order to determine the number of pixels that the second

image needs to be shifted The process concludes by performing the actual shift of the

second frame

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum columns

Sum columns

Minimize SAD

Shift Frame B

Figure 38 Flow diagram for the global motion compensation process

24 Chapter 3 3D face scanner application

35 Decoding

In Section 211 of the literature study the correspondence problem was defined as the

process of determining corresponding point pairs between the captured images and the

projected patterns This is exactly what is being accomplished during the decoding

stage

A novel approach has been implemented in which the identification of the projector

stripes is based not on the values of the pixels themselves (as it is typically done) but

rather on the edges formed by the transitions of the projected patterns Figure 39

illustrates the different sets of decoded values that result with each of these methods

Here it is possible to observe that the pixel-based method produces a stair-casing effect

due to the decoding of neighboring pixels that lie on the same stripe of the projected

pattern On the other hand the edge-based method removes this undesirable effect by

decoding values for only parts of the image in which a transition occurs Furthermore

this approach enables sub-pixel accuracy for the determination of the positions where the

transitions occur meaning that the overall resolution of the 3D reconstruction increases

considerably

350 352 354 356 358 360 362 364 366 368

200

201

202

203

204

205

206

207

Pixels along the y dimension of the image

Dec

oded

val

ues

Edge vs pixel based decoding

Edgeminusbased decodingPixelminusbased decoding

Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used

The decoding process results in a set of vertices each one associated with a depth code

Note however that the unit of measurement used to describe the position and depth of

each vertex is based on camera pixels and code values respectively meaning that these

vertices still do not represent the actual geometry of the face The calibration process

explained in a later section is the part of the application that translates the pixel and

Chapter 3 3D face scanner application 25

code values to standard units (such as millimeters) thus recreating the actual shape of

the human face

36 Tessellation

Tessellation refers to the process of covering a plane using different geometric shapes in

a manner such that no overlaps occur In computer graphics these geometric shapes

are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles

is that they have by definition its vertices on a same plane This in turn avoids

the generation of non-simple convex polygons that are not guaranteed to be rendered

correctly A complete example illustrating this point can be found in [32]

A set of 3D vertices calculated in the decoding stage is the input to the tessellation

process Here however the third dimension does not play a role and hence the z

coordinate for each of the vertices can be thought of as being equal to 0 This implies

that the new set of vertices consist only of (x y) coordinates that lie on the same plane

as shown in Figure 310a This graph corresponds to a very close view of the nose area

in the reconstructed face example

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model before tessellation

x

y

(a) Vertices before applying the Delaunay trian-gulation

368 370 372 374 376

258

259

260

261

262

Zoomedminusin model after tessellation

x

y

(b) Result after applying the Delaunay triangu-lation

Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess

The question that arises here is how to connect the vertices in such a way that the com-

plete surface is covered with triangles The answer is to use the Delaunay triangulation

which is probably the most common triangulation used in computer vision The main

advantages that it has over other methods is that the Delaunay triangulation avoids

ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the

Delaunay triangulation is independent of the order in which the vertices are processed

26 Chapter 3 3D face scanner application

Figure 310b shows the result of applying the Delaunay triangulation to the vertices

shown in Figure 310a

Although there exists a number of different algorithms used to achieve the Delaunay

triangulation the final outcome of each conforms to the following definition a Delaunay

triangulation for a set P of points in a plane is a triangulation DT(P) such that no

point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can

be understood by examining Figure 311

Page 1 of 1

09072013fileDDesktopDelaunay_circumcircles_centerssvg

Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]

37 Calibration

The set of (x y) vertices with their corresponding depth code values that result from

the decoding process do not represent standard units of measure ie these still have to

be translated into standard units such as millimeters This is precisely the objective of

the calibration process

The calibration mechanism that is used in the application is based on the work of Peter-

Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts

an offline and an online process Moreover the offline process consists of two stages

the camera calibration and the system calibration It is important to clarify that while

the offline process is performed only once (camera properties and distances within the

system do not change with every scan) the online process is carried out for every scan

instance The calibration stage referred to in Figure 31 is the latter

Chapter 3 3D face scanner application 27

371 Offline process

As already mentioned the offline process comprises the two stages described below

Camera calibration This part of the process is concerned with the calculation of the

intrinsic parameters of the camera as explained in Section 22 of the literature

study In short the objective is to precisely quantify the optical properties of the

camera The manner in which the current approach accomplishes this is by imag-

ing the special calibration chart shown in Figure 312 from different orientations

and distances After corresponding markers in the different images are found an

algorithm searches the optimal set of camera parameters for which triangulation

of all corresponding marker-point pairs gives an accurate reconstruction of the

calibration chart

Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions

and photometric properties of the round markers are known precisely

System calibration The second part of the calibration process refers to the camera-

projector system calibration ie the determination of the extrinsic parameters

of the system Again this part of the process images the calibration chart from

different distances However this time structured light patterns are emitted by

the projector while the acquisition process takes place The result is that each

projector code is associated with a known depth and camera position

372 Online process

The result of the offline calibration is a set of parameters that model the optical proper-

ties of the scanner system These are passed to the application inside the XML file for

every scan Such parameters represent the coefficients of a fifth-order polynomial used

for translating the set of (x y) vertices with their corresponding depth code values into

28 Chapter 3 3D face scanner application

standard units of measure In other words the online process consists of evaluating a

polynomial with all the x y and depth code values calculated in the decoding stage in

order to reconstruct the geometry of the face Figure 313 shows the state of the 3D

model before and after the reconstruction process

(a) Before reconstruction (b) After reconstruction

Figure 313 The 3D model before and after the calibration process

38 Vertex filtering

As it can be seen from Figure 313b there are a number of extra vertices (and faces)

that have not been correctly reconstructed and therefore should be removed from the

model Vertex filtering is applied to remove all these noisy vertices and faces based on

different criteria The process is divided in the following three steps

381 Filter vertices based on decoding constraints

First if the distance between consecutive decoded points is larger than a maximum

threshold in the (x) or (z) dimensions then these are removed Second in order to

avoid false decoded vertices due to camera noise (specially in the parts of the images

where light does not hit directly) a minimal modulation threshold needs to be exceeded

or else the associated decoded point is discarded Finally if the decoded vertices lie

outside a margin defined in accordance to the image dimensions then these are removed

as well

Chapter 3 3D face scanner application 29

382 Filter vertices outside the measurement range

The measurement range defined during the offline calibration refers to the minimum

and maximum values that each decoded point can have in the z dimension These values

are read from the XML file The long triangles shown in Figure 313b that either extend

far into the picture or on the other hand come close to the camera are all removed in

this stage The resulting 3D model after being filtered with the two previously described

criteria is shown in Figure 314a

383 Filter vertices based on a maximum edge length

Several steps are involved in the removal of vertices based on the maximum edge length

criterion Initially the length of every edge contained in the model is calculated This

is followed by determining a new set of edges L that contains the longest edge in each

face After this operation the mean length value for the longest edge set is calculated

Finally only faces that have its longest edge value less than seven times the mean value

ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation

(a) The 3D model after thefiltering steps described inSubsections 381 and 382

(b) The 3D model after thefiltering step described in

Subsection 383

(c) The 3D model after thefiltering step described in

Section 39

Figure 314 3D resulting models after various filtering steps

39 Hole filling

In the last processing step of the 3D face scanner application two actions are performed

The first one is concerned with an algorithm that takes care of filling undesirable holes

that appear due to the removal of vertices and faces that were part of face surface This

is accomplished by adding a vertex in the middle of the hole and then connecting every

surrounding edge with this point The second action refers to another filtering step of

30 Chapter 3 3D face scanner application

vertices and faces In this last part of the application the program removes all but the

largest group of connected faces The final 3D model is shown in Figure 314c

310 Smoothing

Taking into account that the smoothing process is beneficial for visualization purposes

but not for the overall goal of the 3D mask sizing project this process was not taken

into account as part of the 3D face scanner application This is also the reason why it

is not included in Figure 31 Nevertheless this section provides a brief explanation of

the smoothing process that is currently used along with an example

A complete explanation of the algorithm that is being used to achieve the smoothing

effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian

operator that diffuses the vertices along the surface An example of the resulting model

before and after applying the smoothing process is shown in Figure 315

(a) The 3D model before smoothing (b) The 3D model after smoothing

Figure 315 Forehead of the 3D model before and after applying the smoothing process

Chapter 4

Embedded system development

Modern design of embedded systems requires hardware and software not to be seen as

two different domains but rather as two complementary parts of a whole There are two

important trends that have made such unified view possible First integrated circuit

(IC) technology has evolved to the point where multiple processors of different types

coexist in a single IC Second the increasing complexity and average size of programs

added to the evolution of compiler technologies raised C compilers (and even C++ or

Java in some cases) to become commonplace in the development of embedded systems

[35]

This chapter discusses the embedded hardware and software implementation of the 3D

face scanner A brief account of the hardware and software tools that were used during

the development of the application is presented first Subsequently the first stage of the

development process is described which consists mainly of translating the algorithms

and methods described in Chapter 3 into a different programming language more suitable

for embedded systems Finally a preview of the developed visualization module that

displays the 3D reconstructed face is presented along with a brief description of its

functionality

41 Development tools

This section describes the set of tools used in the development of the embedded applica-

tion First an overview of the hardware is presented highlighting the most important

aspects that are of interest to the 3D face scanner application This is then followed by

a list of the software tools along with a short motivation for their selection A so called

remote development methodology was used for the compilation process The idea is to

31

32 Chapter 4 Embedded system development

run an integrated development environment (IDE) on a client system for the creation of

the project editing of the files and usage of code assistance features in the same manner

as done with local projects However when the project is built run or debugged the

process runs on a remote server with output and input transferred to the client system

411 Hardware

A current trend in the embedded world is the use of single-board computers (SBCs) as

development platforms SBCs combine most features of a conventional desktop computer

into a single board which can be as small as a credit card One or more processors of

different types memory on-board peripherals for multiple USB devices single or dual

gigabit Ethernet connections integrated graphics and audio capabilities amongst others

are common features included in these devices But perhaps what is most interesting

for embedded developers is the availability of several SBCs that come under open source

hardware category [36] Such SBCs are suitable for the implementation of a wide range

of applications on the basis of open operating systems

Two different hardware environments were used in the development of the current em-

bedded application a conventional desktop personal computer (PC) with an Intel x86

architecture and a SBC that was selected according to the following survey

4111 Single-board computer survey

A prior survey of popular SBCs available in the market was conducted with the intention

of finding the most suitable model for our application Table 41 presents a subset of the

considered models highlighting the most relevant characteristics for the 3D face scanner

application Refer to [37] for the complete survey

The model to be chosen has to comply with several requirements imposed by the 3D

face scanner application First support for both a camera and a projector had to be

offered While all of the considered models showed special support for video output

not all of them provided suitable characteristics for camera signal acquisition In fact

most of them rely on USB or Ethernet connections for this purpose The problem of

using USB technology for camera acquisition is that it is highly resource demanding On

the other hand Ethernet connections imply streaming video in formats such as MPEG

which require additional computational resources and buffering for decoding the video

stream Explicit periphery support for camera acquisition was only offered by two of

the considered models the BeagleBoard-xM and the PandaBoard

Chapter 4 Embedded system development 33

Table 41 Single-board computer survey

BeagleBoard-xM

CPU ARM Cortex-A8 1000 MHz

RAM 512 MB

Video output DVI-D HDMI S-Video

GPU PowerVR SGX OpenGL ES 20

Camera port Yes

Raspberry Pi Model B

CPU ARM1176 700 MHz

RAM 256 MB

Video output Composite RCA HDMI DSI

GPU Bradcom VideoCore IV OpenGL ES 20

Camera port No

Cotton candy

CPU dual-core ARM Cortex-A9 1200 MHz

RAM 1 GB

Video output HDMI

GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20

Camera port No

PandaBoard

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI DVI-D LCD

GPU PowerVR SGX540 OpenGL ES 20

Camera port Yes

Via APC

CPU ARM11 800 MHz

RAM 512 MB

Video output HDMI VGA

GPU Built-in 2D3D Graphic OpenGL ES 20

Camera port No

MK802

CPU ARM Cortex-A8 1000 MHz

RAM 1 GB

Video output HDMI

GPU Mali-400 MP OpenGL ES 20

Camera port No

Snowball

CPU dual-core ARM Cortex-A9 1000 MHz

RAM 1 GB

Video output HDMI CVBS

GPU Mali-400 MP OpenGL ES 20

Camera port No

34 Chapter 4 Embedded system development

A second issue in the selection of the SBC was concerned with the project objective of

developing a module capable of visualizing the 3D reconstructed model by means of the

embedded projector It was considered that the achievement of this objective could be

greatly simplified by selecting an SBC model that offered support for rendering of 3D

computer graphics by means of an API preferably OpenGL ES Nevertheless all of the

SBC models considered in the survey featured a graphical processor unit (GPU) with

such support

Finally one last important motivation for the selection came from the experience gath-

ered through related projects The BeagleBoard-xM had been used as the embedded

computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-

able implementation effort could be saved if this option were adopted Consequently it

was the BeagleBoard-xM that was selected as the SBC model for the development of

the current project

4112 BeagleBoard-xM features

The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is

a low-power open-source hardware system that was designed specifically to address

the Open Source Community It measures 8255 by 8255 mm and offers most of the

functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system

on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1

GHz and 512 MB of LPDDR RAM Several open operating systems have been made

compatible with such processor including Linux FreeBSD RISC OS Symbian and

Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated

video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to

provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]

In addition to the previously mentioned characteristics the ARM Cortex-A8 processor

comes with a general-purpose SIMD (Single instruction Multiple data) engine known as

NEON This technology is based on a 128-bit SIMD architecture extension that provides

flexible and powerful acceleration for consumer multimedia products as described [39]

412 Software

The main factors involved in the selection of software tools were (i) available support by

a large development community and (ii) acquisition costs and licensing charges Open

source software was adopted where possible Moreover prior experience with the tools

was also taken into account The software can be divided in two categories (i) software

Chapter 4 Embedded system development 35

Figure 41 The BeagleBoard-xM offered by Texas instruments

libraries that are used within the application and therefore are necessary for its execution

and (ii) software tools used specifically for the development of the application and hence

are not required for its execution In what follows each of these is briefly described

4121 Software libraries

The following software libraries are being used throughout the implementation of the

embedded application

libxml2 It is a software library used for parsing XML documents which was originally

developed for the Gnome project and was later made available for outside projects

as well The current application makes use of such tool for extracting the required

information from the XML file that is included for each scan

OpenCV Is an open source computer vision and machine learning software library

initiated by Intel It provides the necessary functionality to construct the Delaunay

triangulation described in Chapter 3 Though it was used in the initial versions of

the application later optimizations replaced OpenCV implementations

CGAL Consists of a software library that aims to provide access to algorithms in

computational geometry It is being used in the current application as a means

to simplify the resulting mesh surface ie to reduce the number of faces used to

represent the surface while keeping the overall shape of the reconstructed model

OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-

cally for embedded systems It consists of a cross-language multi-platform Appli-

cation Programming Interface (API) for rendering 2D and 3D computer graphics

36 Chapter 4 Embedded system development

It is used in the current application as the means to visualize the 3D reconstructed

model

GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL

used to create windows andor frame buffers It is being used in the visualization

module of the application as well

4122 Software development tools

The following list presents a description of the most important software tools used for

the development of the embedded application

GNU toolchain It refers to a collection of programming tools produced by the GNU

Project that provide developing facilities for applications and operating systems

Among the several projects that comprise the GNU toolchain the following were

used

GNU Make It is a utility that automates the building process of executable

programs by reading the so-called makefiles which specify how to create the

target program

GCC It is the official compiler of the GNU operating system and has been

adopted as standard by most modern Unix-like computer operating systems

GNU Binutils Involves a set of programming tools that are used in the develop-

ment process of creating and managing programs object files libraries profile

data and assembly source code The commands as (assembler) ld (linker)

and gprof (profiler) were used among the complete set of binutil commands

GNU Project debugger It is the standard debugger for the GNU operating

system which was made available for the development of applications outside

this project as well

Valgrind It is a programming tool that can automatically detect memory management

errors It also provides the functionality of a profiler

Ubuntu A Linux based operating system that is distributed as free and open source

software It was installed in both the desktop PC and the SBC

Chapter 4 Embedded system development 37

42 MATLAB to C code translation

This section describes the first stage of the embedded application development that

involves the translation of a series of algorithms originally written in MATLAB code to

C

Despite the fact that there are a number of available tools that automatically translate

MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-

C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number

of pitfalls that compromise their applicability specially when the performance aspect

is of ultimate importance Perhaps what is most concerning is that each one of these

tools only supports a subset of the MATLAB language and functions meaning that

the complete functionality of MATLAB is immediately constrained by this requirement

In many cases this would imply a modification to the MATLAB code prior to the

translation process in order to filter out any feature or function not included in the

subset which adds overhead to the development process Examples of features not

supported by automatic translation tools are amongst others objects cell arrays nested

functions visualization or trycatch statements The use of an automatic translation

tool was discarded for this project taking into account that several of these unsupported

features are present in the MATLAB code

421 Motivation for developing in C language

There are a number of reasons that explain why C is among the most popular pro-

gramming languages used for the development of embedded systems The first is that

C language lies in an intermediate point between higher and lower level languages pro-

viding suitable characteristics for embedded system development from both sides The

problem with higher level languages relies on the fact that they do not provide suitable

characteristics for optimizing performance of the applications such as low-level memory

manipulation Furthermore unlike many of these higher level programming languages

C provides deterministic resource use which is an important feature when the target de-

vices contain limited resources On the other hand C outperforms lower level languages

in a number of aspects such as scalability and maintainability Two final motivations

for using C are (i) C compilers are available for almost all embedded devices which are

supported by a large pool of experienced C programmers and (ii) the vast majority of

hardware APIdrivers are written in C

38 Chapter 4 Embedded system development

422 Translation approach

As mentioned earlier a manual translation approach of the code was chosen over the

use of automatic translation tools A key part in the process of manually translating

MATLAB to C code is the verification process There are two major techniques used

to achieve such verification The first one consists of a systematic method of converting

the translated C code into a compiled MEX-file that can be merged into the original

MATLAB project Then by comparing the results generated by the MATLAB project

containing the C implementation wrapped in a MEX-file with those generated by the

original MATLAB project one should be able to verify the correctness of the translation

The second approach consists of writing corresponding intermediate results of both the

MATLAB and C implementations to external files and then using a file comparison tool

such as diff for Linux environments in order to validate equality of both results It was

the latter approach that was chosen for the development of the current application for

the following reason The former approach requires the C implementation to be wrapped

in a so called MEX wrapper which takes care of the communication between MATLAB

and C This task is considered to be error prone since crashes segmentation violations

or incorrect results can easily occur if the MEX wrapper does not allocate and access

the data properly as reported by Marc Barberis in [40] from Catalytic Inc

A number of pitfalls that add complexity to the manual translation process were iden-

tified throughout the development of this stage The most important are

bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing

starts with 0 Although this does not seem like a major difference it was found

that such simple change could easily introduce errors

bull MATLAB uses column major ordering whereas C uses a row major approach

Special care must be taken to guarantee that spatial locality is maintained after

the translation process takes place ie the order in which data is processed should

correspond to the order in which it is laid out in memory Not complying with

this idea could induce a serious loss in performance of the resulting code

bull MATLAB is an interpreted language ie data types and variable dimensions are

only known at run-time thus these cannot be easily deduced from analyzing the

source code

bull MATLAB supports dynamic sizing of arrays whereas such operations in C require

explicit allocationreallocationdeallocation of memory using constructs such as

malloc realloc or free

Chapter 4 Embedded system development 39

bull MATLAB features a rich set of libraries that are not available in C This can imply

a large overhead in the development process if many of these functions have to be

implemented

bull Many of the vector-based operations available in MATLAB translate into nontriv-

ial loop constructs in C language For example mapping MATLABrsquos easy-to-use

concatenation operation to C involves considerable effort

bull Last but not least MATLAB supports reusing the same variable for storing data

of different types dimensions and sizes On the contrary C language requires all

variables to be cast to a specific data type (or declared as known in the program-

ming field) before they can be used Furthermore MATLAB uses a wide variety

of generic types that are not available in C and hence requires the programmer

to implement them while relying on structure constructs of primitive types

43 Visualization

This section describes the different steps involved in the visualization module developed

to display the reconstructed 3D models by means of the embedded projector contained

in the hand-held device Figure 42 extends the general overview of the application

presented in 31 by incorporating the visualization module This figure shows that a

resulting 3D model of the face reconstruction process consists of 4 different elements a

set of vertices a set of faces a set of UV coordinates and a texture image

3D Face Reconstruction

Camera Frame

Sequence

XML file

Faces

Vertices

UV coordinates

Visualization

Texture 1

Figure 42 Simplified diagram of the 3D face scanner application

Vertices and faces describe the geometry of the reconstructed model Each face consists

of three index values that determine the vertices that conform a triangle On the other

hand UV coordinates together with the texture image describe the texture of the model

Figure 43 shows how UV coordinates are used to map portions of the texture image

40 Chapter 4 Embedded system development

to individual parts of the model Each vertex is associated with an UV coordinate

When a triangle is rendered the corresponding UV coordinates of each vertex are used

to extract a portion of the texture image to place it on top of the triangle

119907

119906 (00)

(01) (11)

(10)

Figure 43 UV coordinate system

Figure 44 presents an overview of the visualization module The first step of the process

is to simplify the 3D model ie to reduce the number of triangles (and vertices) used

to represent the surface Note that while a high resolution is needed for the algorithms

that determine the fit quality of the different mask models a much lower resolution can

be used for visualization purposes In fact due to the limited available resources in

embedded systems such simplification becomes necessary to avoid lag when zooming

rotating or panning the model Edge collapse is a common term used for the simpli-

fication process which is shown in Figure 44 Input vertices and faces of this block

are converted into a smaller set denoted as New vertices and New faces on the diagram

However since the new set of vertices and faces do not have a one-to-one correspondence

to the original set of UV coordinates such coordinates have to be updated as well The

manner in which this is accomplished is by using the Nearest Neighbor algorithm Every

new vertex is assigned the UV coordinate of its closest original vertex

The next stage of the process is to format the new set of vertices faces and UV co-

ordinates together with the texture 1 image such that OpenGL can render the model

Chapter 4 Embedded system development 41

Subsequently normal vectors are calculated for every triangle which are mainly used

by OpenGL for lighting calculations Every vertex of the model has to be associated

with one normal vector To do this an average normal vector is calculated for each

vertex based on the normal vectors of the triangles that are connected to it Moreover

a cross-product multiplication is used to calculate the normal vector of each triangle

Once these four elements that characterize the 3D model are provided to OpenGL the

program enters in an infinite running state where the model is redrawn every time a

timer expires or when an interactive operation is sent to the program

Mesh simplification

Faces

Vertices

UV coordinates

Edge Collapse

New vertices Nearest

Neighbor

New faces New vertices New UV coordinates

Vertices

Change to OpenGL format

Calculate normals

GL vertices

GL faces GL UV coordinates

OpenGL

Texture 1

Normals

GL Texture 1

Figure 44 Diagram of the visualization module

Chapter 5

Performance optimizations

This chapter presents various performance optimizations made to the 3D face scanner

application ranging from high-level optimizations such as modification of the algo-

rithms to low-level optimizations such as the implementation of time-consuming parts

in assembly language

In order to verify that the achieved optimizations were valid in general and not for

specific cases 10 scans of different persons were used for profiling the performance of the

application Every profile consisted of running the application 10 times for each scan and

then averaging the results in order to reduce the influence that external factors might

have in the measured times Figure 51 presents an example of the graphs that will be

used throughout this and the following chapters to represent the changes in performance

Here each bar is divided into different colors that represent the distribution of the total

execution time among the various stages of the application described in Chapter 3 and

summarized in Figure 31

The translation from MATLAB to C code corresponds to the first optimization per-

formed The top two bars in Figure 51 show that the C implementation resulted in

a speedup of approximately 15 times over the MATLAB implementation running on

a desktop computer On the other hand the bottom two bars reflect the difference

in execution time after running the C implementation in two different platforms The

much more limited resources available in the BeagleBoard-xM have a clear impact on

the execution time The C code was compiled with GCCrsquos O2 optimization level

The bottom bar in Figure 51 represents the starting point for a set of optimization

procedures that will be described in the following sections The order in which these are

presented corresponds to the same order in which they were applied to the application

43

44 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

51 Double to single-precision floating-point numbers

The same representation format of floating-point numbers for the MATLAB and C

implementations were necessary to compare both results in each step of the translation

process The original C implementation was implemented using double-precision format

because this is the format used in the MATLAB code Taking into account that the

additional precision offered by double-precision format over single-precision was not

essential and that the ARM Cortex-A8 processor features a 32 bit architecture the

conversion from double to single-precision format was made Figure 52 shows that with

this modification the total execution time decreased from 1453 to 1252 sec

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Double-precision

Single-precision

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 52 Difference in execution time when double-precision format is changed tosingle-precision

52 Tuned compiler flags

While the previous versions of the C code were compiled with O2 performance level

the goal of this step was to determine a combination of compiler options that would

Chapter 5 Performance optimizations 45

translate into faster running code A full list of the options supported by GCC can be

found in [41] Figure 53 shows that the execution time decreased by approximately 3

seconds (24 of the total time 125 sec) after tuning the compiler flags The list of

compiler flags that produced best performance at this stage of the optimization process

were

-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution

-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp

0 1 2 3 4 5 6 7 8 9 10 11 12 13

O2 optimization level

Tuned flags

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 53 Execution time before and after tuning GCCrsquos compiler options

53 Modified memory layout

A different memory layout for processing the camera frames was implemented to further

exploit the concept of spatial locality of the program As noted in Section 33 many of

the operations in the normalization stage involve pixels from pairs of consecutive frames

ie first and second third and fourth fifth and sixth and so on Data of the camera

frames were placed in memory in a manner such that corresponding pixels between frame

pairs laid next to each other in memory The procedure is shown in Figure 54

However this modification yielded no improvement on the execution time of the appli-

cation as can be seen from Figure 55

54 Reimplementation of Crsquos standard power function

The generation of Texture 1 frame in the normalization stage starts by averaging the last

two camera frames followed by a gamma correction procedure The process of gamma

correction in this application consists of elevating each pixel to the 085 power After

profiling the application it was found that the power function from the standard math

C library was taking most of the time inside this process Taking into account that the

46 Chapter 5 Performance optimizations

Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames

respectively

0 1 2 3 4 5 6 7 8 9 10

Normal memory layout

Modified memory layout

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames

high accuracy offered by such function was not required and that the overhead involved

in validating the input could be removed a different implementation of such function

was adopted

A novel approach was proposed by Ian Stephenson in [42] explained as follows The

power function is usually implemented using logarithms as

pow(a b) = xlogx(a)lowastb

where x can be any convenient value By choosing x = 2 the process of calculating the

power function reduces to finding fast pow2() and log2() functions Such functions can

be approximated with a few instructions For example the implementation of log2(a)

can be approximated based on the IEEE floating point representation of a

Chapter 5 Performance optimizations 47

exponent mantissa

a = M lowast 2E

where M is the mantissa and E is the exponent Taking log of both sides gives

log2(a) = log2(M) + E

and since M is normalized log2(M) is always small therefore

log2(a) asymp E

This new implementation of the power function provides the improvement of the execu-

tion time shown in Figure 56

0 1 2 3 4 5 6 7 8 9 10

Standard C power function

Power function reimplemented

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function

55 Reduced memory accesses

The original order of execution was modified to reduce the amount of memory access and

to increase the temporal locality of the program Temporal locality is a principle stating

that referenced memory locations will tend to be referenced again soon Moreover

the reordering allowed to replace floating-point calculations with integer calculations in

the modulation stage which are known to typically execute faster in ARM processors

Figure 57 shows the order in which the algorithms are executed before and after this

optimization By moving the calculation of the modular frame to the preprocessing

stage the values of the camera frames do not have to be re-read Moreover the processes

of discarding cropping and scaling frames are now being performed in an alternating

fashion together with the calculation of the modular frame This loop merging improves

the locality of data and reduces loop overhead Figure 58 shows the change in execution

time of the application for this optimization step

48 Chapter 5 Performance optimizations

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Modulation Texture 2 Normalize

Execution flow

Rest of program

(a) Original order of execution

Preprocessing

Parse XML file

Discard frames

Crop frames

Scale

Normalization

Texture 1 Texture 2 Normalize

Execution flow

Rest of program

Modulation

(b) Modified order of execution

Figure 57 Order of execution before and after the optimization

0 1 2 3 4 5 6 7 8 9

After reordering

Before reordering

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 58 Difference in execution time before and after reordering the preprocessingstage

Chapter 5 Performance optimizations 49

56 GMC in y dimension only

A description of the global motion compensation (GMC) method used in the applica-

tion was presented in Chapter 3 Figure 38 shows the different stages of this process

However this figure does not reflect the manner in which the GMC was initially imple-

mented in the MATLAB code In fact this figure describes the GMC implementation

after being modified with the optimization described in this section A more detailed

picture of the original GMC implementation is given in Figure 59 Previous research

found that optimal results were achieved when GMC is applied in the y direction only

The manner in which this was implemented was by estimating GMC for both directions

but only performing the shift in the y direction The optimization consisted in removing

all unnecessary calculations related to the estimation of GMC in the x direction This

optimization provides the improvement of the execution time shown in Figure 510

Global motion compensation

Normalized frame

sequence

For every pair of consecutive frames

Frame A

Frame B

Sum rows and columns

Sum rows and columns

Minimize SAD in x and y

Shift Frame B in y dim only

Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code

0 1 2 3 4 5 6 7 8 9

Original GMC

GMC in y only

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 510 Difference in execution time before and after modifying the GMC stage

50 Chapter 5 Performance optimizations

57 Error in Delaunay triangulation

OpenCV was used to compute the Delaunay triangulation A series of examples available

in [43] were used as references for our implementation Despite the fact that OpenCV

constructs the triangulation while abstracting the complete algorithm from the pro-

grammer a not so straightforward approach is required to extract the triangles from

a so called subdivision OpenCV offers a series of functions that can be used to nav-

igate through the edges that form the triangulation It is therefore the responsibility

of the programmer to extract each of the triangles while stepping through these edges

Moreover care must be taken to avoid repeated triangles in the final set An error was

detected at this point of the optimization process in the mechanism that was being used

to avoid repeated triangles Figure 511 shows the increase in execution time after this

bug was resolved

0 1 2 3 4 5 6 7 8 9

Before fixing bug

After fixing bug

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 511 Execution time of the application increased after fixing an error in thetessellation stage

58 Modified line shifting in GMC stage

A series of optimizations performed to the original line shifting mechanism in the GMC

stage are explained in this section The MATLAB implementation uses the circular shift

function to perform the alignment of the frames (last step in Figure 38) Given that

there is no justification for applying a circular shift a regular shift was implemented

instead in which the last line of a frame is discarded rather than copied to the opposite

border Initially this was implemented using a for loop Later this was optimized even

further by replacing such for loop with the more optimized memcpy function available

in the standard C library This in turn led to a faster execution time

A further optimization was obtained in the GMC stage which yielded better memory

usage and faster execution time The original shifting approach used two equally sized

portions of memory in order to avoid overwriting the frame that was being shifted The

Chapter 5 Performance optimizations 51

need for a second portion of memory was removed by adding some extra logic to the

shifting process A conditional statement was included in order to determine if the shift

has to be performed in the positive or negative direction In case the shift is negative ie

upwards the shifting operation traverses the image from top to bottom while copying

each line a certain number of rows above it In case the shift is positive ie downwards

the shifting operation traverses the image from bottom to top while copying each line a

certain number of rows below it The result of this set of optimizations is presented in

Figure 512

0 1 2 3 4 5 6 7 8 9

Before changes to GMC

After changes to GMC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage

59 New tessellation algorithm

A good motivation for using the Delaunay triangulation in a two-dimensional space is

presented by Rippa [44] who proves that such triangulation minimizes the roughness of

the resulting model Nevertheless an important characteristic of the decoding process

used in our application allows the adoption of a different triangulation mechanism that

improved the execution time significantly while sacrificing smoothness in a very small

amount This characteristic refers to the fact that the resulting set of vertices from

the decoding stage are sorted in an increasing manner This in turn removes the need

to search for the nearest vertices and therefore allows the triangulation to be greatly

simplified More specifically the vertices are ordered in increasing order from left to

right and bottom to top in the plane Moreover they are equally spaced along the y

dimension which simplifies even further the algorithm needed to connect such vertices

into triangles

The developed algorithm traverses the set of vertices row by row from bottom to top

creating triangles between every pair of consecutive rows Moreover each pair of con-

secutive rows is traversed from left to right while connecting the vertices into triangles

52 Chapter 5 Performance optimizations

The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-

rithm describes the connection of vertices until the moment in which the last vertex of

either row is reached The unconnected vertices that remain in the other longer row

are connected with the last vertex of the shorter row in a later step (not included in

Algorithm 1)

Algorithm 1 New tessellation algorithm

1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row

10 end if11 end while12 end for

Figure 513 shows the result of applying the two described triangulation methods to the

same set of vertices The execution time of the application was reduced by approximately

14 seconds with this optimization as shown in Figure 514 Furthermore the new

triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos

Delaunay triangulation implementation

406 408 410 412 414

18

19

20

21

22

Delaunay triangulation

x

y

(a) Delaunay triangulation

406 408 410 412 414

18

19

20

21

22

Optimized triangulation

x

y

(b) Optimized triangulation

Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted

510 Modified decoding stage

A major improvement was achieved in the execution time of the application after op-

timizing several time-consuming parts of the decoding stage As a first step two fre-

quently called functions of the standard math C library namely ceil() and floor()

Chapter 5 Performance optimizations 53

0 1 2 3 4 5 6 7 8 9

Delaunay triangulation

New triangulation algorithm

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach

were replaced with faster implementations that used pre-processor directives to avoid the

function call overhead Moreover the time spent in validating the input was also avoided

since it was not required However the property that allowed the new implementations

of the ceil() and floor() functions to increase the performance to a greater extent

was the fact that such functions only operate on index values Given that index values

only assume non-negative numbers the implementation of each of these functions was

further simplified

A second optimization applied to the decoding stage was to replace dynamically allocated

memory on the heap with statically allocated memory on the stack while controlling that

the amount of memory to be stored would not cause a stack overflow Stack allocation

is usually faster since it is memory that is faster addressable

The last optimization consisted on the detection and removal of several tasks that were

not contributing to the final result The reason why such tasks were present in the

application is due to the fact that several alternatives were implemented for achieving a

common goal during the algorithmic design stage However after assessing and choosing

the best option the other ones were forgotten to be entirely removed

The overall result of the optimizations described in this section is shown in Figure 515

An important reduction of approximately 1 second was achieved As a rough estimate

half of this speedup can be attributed to the removal of the nonfunctional code

511 Avoiding redundant calculations of column-sum vec-

tors in the GMC stage

This section describes the last optimization performed to the GMC stage The algorithm

presented in Figure 38 has the following shortcoming for every pair of consecutive

54 Chapter 5 Performance optimizations

0 1 2 3 4 5 6 7

Original decoding stage

Modified decoding stage

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 515 Execution time of the application before and after optimizing the decodingstage

frames the sum of pixels in each column is calculated for both frames This means that

the column-sum vector is calculated twice for each image except for the first and last

frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous

iteration such recalculation can be avoided An updated version of the GMC stage that

incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage

after performing this optimization was approximately 18 times Figure 517 shows the

execution times of the application before and after removing the redundant calculations

512 NEON assembly optimization 1

The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-

sors was exploited for the last series of optimizations performed to the 3D face scanner

application The first step was to detect the stages of the application that exhibit rich

amount of exploitable data operations where the NEON technology could be applied

The vast majority of the operations performed in the preprocessing normalization and

global motion compensation stages are data independent and therefore suitable for

being computed in parallel on the ARM NEON architecture extension

There are four major approaches to integrate NEON technology into an existent appli-

cation (i) by using a vectorizing compiler that automatically translates CC++ code

into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-

ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON

instructions but with the compiler doing some of the work associated with writing as-

sembly instructions and (iv) by directly writing NEON assembly instructions linked to

the CC++ project in the compilation process A detailed explanation of each of these

approaches can be found in [45] Based on the results achieved in [46] directly writing

NEON assembly instructions outperforms the other alternatives and therefore it was

this approach that was adopted

Chapter 5 Performance optimizations 55

Global motion compensation

First pair of consecutive frames

Normalized frame

sequence

For every remaining pair of consecutive frames (from n=3 to n=N)

Column vector Frame n-1

Frame n

Normalized frame

sequence

Frame 1

Frame 2

Sum columns

Sum columns

Minimize SAD

Shift Frame 2

Sum columns

Minimize SAD

Shift Frame n

Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum

0 1 2 3 4 5 6

With recalculations

Without recalculations

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage

56 Chapter 5 Performance optimizations

Figure 518 presents the basic principle behind the SIMD architecture extension along

with the related terminology Depending on the data type of the elements involved in

the operation either 2 4 8 or 16 elements can be operated with a single instruction

The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)

or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair

of D registers Figure 518 may be interpreted either as an operation of 2 Q registers

where each of the 8 elements would have 16 bits or as an operation of 2 D registers

where each of the 8 elements would be 8 bits wide

Elements

Operation

Source Registers

Destination Register

Lane

Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology

An overview of the resulting execution flow of the preprocessing and normalization stages

after applying the first NEON assembly optimization is presented in Figure 519 Here

green rectangles represent stages of the application that are now calculated with NEON

technology whereas blue rectangles represent stages implemented in regular C code In

Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame

sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-

tion groups of 8 pixels are packed into D registers in order to process 8 elements at a

time Note that each resulting element of the texture 2 frame is immediately reused in

the normalization process Moreover each of the 8 resulting values in both the texture

2 generation and the normalization stage are converted to a 32-bit floating point value

that ranges from 0 to 1

Figure 520 shows that the total execution time of the application actually increased

after this modification There are two reasons that explain what might have caused

such increment First note that the stage of the application that most contributed to

the increase in time was the read binary file The execution time of such process is

heavily affected by any other processes that might be running in parallel Moreover the

execution time of all stages other than those involved with the NEON optimization also

increased This suggests that indeed another process was probably running in parallel

Chapter 5 Performance optimizations 57

using resources of the board and hence affecting the performance of the application

Nevertheless the overall time reduction for the preprocessing and normalization stages

after the optimization was small One very probable reason to explain this could be

found in the modulation stage The first step of such process is to find the smallest

and largest values for every camera frame pixel in the time dimension by means of if

statements When such task is implemented with conventional C language the proces-

sor makes use of a branch prediction mechanism in order to speed up the instruction

pipeline However the use of NEON assembly instructions forces the processor to per-

form the comparison for every single pack of 8 values ignoring the existence of the

branch prediction mechanism

513 NEON assembly optimization 2

After successfully implementing several stages of the application with the use of NEON

assembly instructions the possibility of applying a similar approach to other parts of

the application was analyzed The averaging and gamma correction processes involved

in the calculation of texture 1 were found to be good targets for such purpose The

absence of a NEON instruction to calculate the power of a number can be overcome

by using a lookup table (LUT) In order to explain the approach of how the LUT was

implemented a hypothetical example of camera frames with 2-bit pixels is presented in

Figure 521 Here the first two rows represent the values that corresponding pixels in

the two frames can assume The third row of the table contains the 7 possible values

that can result from averaging two pixels The number of possible values for the general

case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the

fourth row corresponds to the actual LUT which is the average value raised to the 085

power What is interesting is that the sum of the two pixels pixel A + pixel B which in

our application is already determined during the texture 2 stage can be used to index

the table

As a final step in the optimization process a further improvement to the execution flow

presented in Figure 519 was made From this diagram it is possible to observe that the

application has to re-read the last 2 camera frames to calculate the texture 1 frame In

order to avoid such overhead the processing of the camera frames was divided into two

different stages The first one involves the calculation of the modulation texture 2 and

normalization processes for the first 14 frames whereas the second stage additionally

calculates the averaging and gamma correction processes for the last two frames The

merging of these 5 processes for the last two frames is convenient since the addition of

corresponding pixels needed in the averaging and gamma correction stage is already

58 Chapter 5 Performance optimizations

For camera frames 123456hellip1516

For each row

For each vector

Execution flow

Rest of program

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Texture 1

Parse XML file

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code

Chapter 5 Performance optimizations 59

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 1

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization

0 1 2 3

3 25 2 15 1 05 0

119901119894119909119890119897 119860

119886119907119890119903119886119892119890

2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085

119901119894119909119890119897 119860 + 119901119894119909119890119897 119861

119901119894119909119890119897 119861

0 1 2 3

Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels

being calculated as part of the other processes These modifications of the order in which

the different processes are executed are illustrated in Figure 523 which corresponds

to the definite execution flow diagram for the preprocessing and normalization stages

Moreover the improvement of the execution time shown in Figure 522

This final optimization concludes the embedded system development of the 3D face

reconstruction application

0 1 2 3 4 5 6

Before optimization

NEON assembly optimization 2

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization

60 Chapter 5 Performance optimizations

For camera frames 123456hellip1314

For each row

For each vector

Execution flow

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Parse XML file

5x5 mean filter

camera frames 1516

For each row

For each vector

Modulation (step 2)

Scale

camera frames 1516

For each row

For each vector

Modulation

(step 1)

Scale

Texture 2 1199071 + 1199072

Scale

Normalize 1199071 minus 1199072

1199071 + 1199072

Crop row

Average amp Gamma

corr

Rest of program

Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-

nology whereas blue rectangles represent stages implemented in regular C code

Chapter 6

Results

This chapter presents the results of the various stages involved in the implementation

of the 3D face scanner application capable of running on an embedded device The first

section focuses on the results obtained after translating the MATLAB implementation

to C language This is followed by a brief account of the visualization module devel-

oped to display the reconstructed model by means of the embedded device Finally

the last section provides a summary of the performance improvements made to the C

implementation by means of different optimization techniques

61 MATLAB to C code translation

In order to measure the correctness of the conversion from MATLAB to C 13 different

face scans were processed with both the MATLAB and C implementations A qual-

itative comparison of the corresponding reconstructed models yielded no difference in

results Linuxrsquos diff tool was used to perform the comparison between corresponding

models with a precision of 4 decimal places

In what follows a series of graphs show the execution times for various versions of the

application Each bar corresponds to the average execution time required to process 10

scans of different people Moreover each of the different scans was run 10 times and

averaged The bars are divided into different colors that represent the distribution of the

total execution time among the various stages of the application described in Chapter 3

and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the

average execution time of the original MATLAB and C implementations respectively

after processed on a desktop computer The C implementation resulted in a speedup of

approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)

61

62 Chapter 6 Results

On the other hand the last bar in Figure 61 corresponds to the average execution time

of the initial C implementation after processed on the embedded device a BeagleBoard-

xM The execution time increased approximately 14 seconds with respect to the time

spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization

level

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

C code running on BB

C code running on PC

MATLAB code running on PC

time (sec)

Read binary file Preprocessing Normalization

Global motion compensation Decoding Tessellation

Calibration Vertex filtering Hole filling

Other

Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-

mentation on the BeagleBoard-xM

62 Visualization

A visualization module was developed to display the resulting 3D models by means of the

projector contained in the embedded device Figure 62 presents an example The two

images in the top row show a high-resolution 3D model composed of 64k faces rendered

in two different modes The bottom two images show the same 3D model after being

processed with a mesh simplification mechanism that results in a much lower resolution

model (1229 faces) suitable for being rendered by means of an embedded device It is

interesting to note that even though the lower resolution model has approximately 2

of the faces contained in the high resolution model the quality degradation is hardly

visible by comparing the two textured models

63 Performance optimizations

Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation

using a BeagleBoard-xM as the processing platform A wide range of optimizations de-

scribed in Chapter 5 were used to reduce the execution time of the application from 145

to 51 seconds This translates in a speedup of approximately 285 times Furthermore

Chapter 6 Results 63

(a) High-resolution 3D model with tex-ture (63743 faces)

(b) High-resolution 3D model wire-frame (63743 faces)

(c) Low-resolution 3D model with tex-ture (1229 faces)

(d) Low-resolution 3D model wire-frame (1229 faces)

Figure 62 Example of the visualization module developed

Figure 64 presents individual graphs for each stage of the process which provides an

idea of the speedup achieved for each individual stage

64 Chapter 6 Results

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No optimizations

Doubles to floats

Tuned compiler flags

Modified memory layout

pow func reimplemented

Reduced memory accesses

GMC in Y dir only

Delaunay bug

Line shifting in GMC

New tessellation algorithm

Modified decoding stage

No recalculations in GMC

ASM + NEON implem 1

ASM + NEON implem 2

time (sec)

Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther

Figure 63 Performance evolution of the 3D face scannerrsquos C implementation

Chapter 6 Results 65

0 01 02

Before

After

time (sec)

(a) Read binary file

0 025 05 075 1

Before

After

time (sec)

(b) Preprocessing

0 1 2 3

Before

After

time (sec)

(c) Normalization

0 03 06 09 12

Before

After

time (sec)

(d) GMC

0 1 2 3

Before

After

time (sec)

(e) Decoding

0 04 08 12 16

Before

After

time (sec)

(f) Tessellation

0 1 2 3 4 5

Before

After

time (sec)

(g) Calibration

0 01 02 03 04

Before

After

time (sec)

(h) Vertex filtering

0 05 1 15 2

Before

After

time (sec)

(i) Hole filling

Figure 64 Execution time for each stage of the application before and after the com-plete optimization process

Chapter 7

Conclusions

This thesis presented the embedded implementation of a 3D face scanner application

that uses the structured lighting technique A manual translation of the algorithms in

charge of the reconstruction process was performed from MATLAB to C using a file

comparison tool to validate the results of both implementations Thirteen different face

scans were used to verify the correctness of the translated C implementation with respect

with the original MATLAB code the comparison of each corresponding model yielded no

difference whatsoever The C implementation resulted in a speedup of approximately 15

times over the original MATLAB code running on a desktop PC However running the

C implementation on an embedded platform namely a BeagleBoard-xM presented an

increase of the execution time by a factor of 27 times ie an increase of approximately

14 seconds

A wide range of optimizations were performed to reduce the execution time of the appli-

cation These include high-level optimizations such as modifications to the algorithms

and reordering of the execution flow middle-level optimizations such as avoiding re-

dundant calculations and function call overhead and low-level optimizations such as

reimplementing sections of code with NEON assembly instructions

A visualization module based on OpenGL ES was developed to display the reconstructed

3D models by means of the projector contained in the embedded device However given

the high resolution of the reconstructed 3D models and the limited available resources

on the embedded platform a mesh simplification mechanism was implemented to reduce

the resolution until a point where the visualization module could be used with no lag

Although the reconstruction process is only part of a broader project that aims to

develop a technological means to assist sleep technicians in the selection of an adequate

CPAP mask model and size allowing such process to run directly on the device is a first

67

68 Chapter 7 Conclusions

step towards the goal of creating an autonomous self-contained mask advise system

Moreover the functionality of a 3D hand-held face scanner is an important topic that

can easily be extended to different application fields such as security or entertainment

Last but not least the optimizations that allowed the execution time of the application

to be reduced to approximately 5 seconds when processed on an embedded platform

should serve as a reference point not only for other parts of the application where similar

approaches can be adopted but also for related projects where performance is of crucial

interest

71 Future work

Although a significant reduction of the applicationrsquos execution time was achieved with

the set of optimizations presented in this work this is by no means the best result that

can be obtained On the contrary this set of optimizations open new possibilities for

improving the applicationrsquos performance for example by applying similar approaches

to other parts of the application The first idea that comes to mind is to extend the

use of NEON technology to other parts of the program that exhibit a high number of

independent data calculations The 5times 5 filter involved in the calculation of the texture

1 frame together with the sum of columns and the row shifting operations included in

the GMC stage are good candidates to implement using NEON assembly instructions

Note however that further optimizing parts of the program that comprise a small

percentage of the total execution time will not yield significant improvements to the

overall applicationrsquos performance This implies that an assessment of the distribution

of the total execution time among the different tasks of the application is necessary to

determine which parts are the current bottlenecks and hence worth optimizing The last

profiling of the application (bottom bar in Figure 63) reveals that a large fraction of

the execution time is spent in three stages namely decoding calibration and hole filling

Whereas the decoding stage was analyzed and partly optimized in this work the latter

two were not considered for optimization

According to several observations there is a high probability that the calibration stage

can be optimized in an important manner First note the significant increase of the

execution time of this particular stage between the top and bottom profilings in Figure

61 Whereas such increase of time is expected on stages that involve matrix operations

(MATLAB usually performs well with this kind of operations) stages based on control

structures such as the nested for loops present in the calibration stage are not expected

to show a decrease of performance in this manner Moreover note how the first two

optimizations in Figure 63 ie changing the data type from double to float and tuning

Chapter 7 Conclusions 69

the compiler flags had a significant impact on this stagersquos performance Considering

these series of observations it is very probable that the current C implementation of this

stage is not utilizing the available resources of the Beagleboard-xM in the best possible

manner Analyzing how well this part of the program is exploiting spatial and temporal

locality could reveal directions for further optimizations

Finally it is worth noting a few more ideas of how the performance of the application

could still be improved Tuning GCCrsquos compiler flags was performed early in the overall

optimization process It is probable that the combination of flags found to be optimal in

that moment is not anymore for the current state of the application Therefore a new

assessment of compiler flags should be performed It is also important to mention that

there is a specific compiler flag namely -mfloat-abi that specifies which floating-point

application binary interface (ABI) to use The permissible values are soft softfp and

hard Despite the fact that a hard-float ABI is expected to produce better performance

results the use of such configuration was not possible in the current project The reason

is that part of the libraries provided by the underlying operating system where compiled

with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling

this configuration is just a matter of recompiling the OS and the other libraries that are

used by the application with hard-float ABI support Finally it should be noted that

there are a wide range of compilers available on the market that could produce better

results than those of GCC Despite the fact that as part of the current project a few of

the other options were tested GCCrsquos results were always superior However it would

be interesting to measure how the GCC compiler compares with the compilers produced

by ARM which are known to produce fast running code

Bibliography

[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B

DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation

of sleep-disordered breathing sleep apnea and hypertension in a large community-

based studyrdquo JAMA the journal of the American Medical Association vol 283

no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg

content283141829short (cit on p 1)

[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering

from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013

[Online] Available httpwwwutwentenlenarchive201303large_

dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_

sleep_apnea_are_unaware_of_itdocx (cit on p 1)

[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and

clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available

httponlinelibrarywileycomdoi101111j1540-8159200400411

xfull (cit on p 1)

[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data

from Images Springer 1998 isbn 9789813083714 [Online] Available http

booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)

[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected

beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17

1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available

httpwwwsciencedirectcomsciencearticlepii0146664X8290096X

(cit on pp 5 9 11)

[6] M Rocque ldquo3D map creation using the structured light technique for obstacle

avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2

- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http

alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6

34)

71

72 Bibliography

[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object

recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on

pp 9 11)

[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of

light for depth measurementrdquo Trans Institute of Electronics and Communication

Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)

[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random

cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15

no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on

pp 9 11)

[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique

for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-

neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2

doi 101109CCECE1998685637 (cit on pp 9 11)

[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method

for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40

1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available

httpwwwsciencedirectcomsciencearticlepii0031320394E0047O

(cit on pp 9 11)

[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active

rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol

PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987

4767869 (cit on pp 9 11)

[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using

color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456

1997 (cit on pp 9 11)

[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded

light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6

pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W

[Online] Available httpwwwsciencedirectcomsciencearticlepii

003132039290078W (cit on pp 9 12)

[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo

Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358

1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available

httpwwwsciencedirectcomsciencearticlepii0734189X85900568

(cit on pp 9 12)

Bibliography 73

[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in

Pattern Recognition 1990 Proceedings 10th International Conference on vol i

Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)

[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-

tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn

0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg

101007BF01230201 (cit on pp 9 12)

[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for

robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162

doi 101109MC19821653915 (cit on pp 10 14)

[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light

systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)

[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D

facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-

tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619

(cit on p 12)

[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of

active structure lighting mono and stereo camera systems application to 3D face

acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International

Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12

13)

[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured

lightning techniques with a view for facial reconstructionrdquo in Proc Image and

Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200

[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)

[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-

ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on

vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on

p 13)

[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating

methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617

ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-

line] Available http www sciencedirect com science article pii

S0031320301001261 (cit on p 14)

[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite

pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007

doi 101364OE15012318 (cit on p 14)

74 Bibliography

[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-

ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and

Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-

5 IEEE 1986 pp 15ndash20 (cit on p 14)

[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-

tificielle Institut de recherche ne informatique et en automatique 1987 isbn

9782726105726 [Online] Available http books google nl books id =

Rrz5OwAACAAJ (cit on p 14)

[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach

to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis

doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118

[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on

p 15)

[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine

vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-

tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi

101109JRA19871087109 [Online] Available httpdxdoiorg101109

JRA19871087109 (cit on p 15)

[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-

els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE

Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi

10110934159901 (cit on p 15)

[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-

sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands

2000 (cit on pp 15 26)

[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The

Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-

Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)

[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4

pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10

1007BF01553881 (cit on pp 25 26)

[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-

lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual

conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99

New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash

324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available

httpdxdoiorg101145311535311576 (cit on p 30)

Bibliography 75

[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction

Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http

booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)

[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-

ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline

compdfsingle-board-computers_aug10pdf (cit on p 32)

[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech

Rep Jan 2013 (cit on p 32)

[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December

p 81 2009 (cit on p 34)

[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on

p 34)

[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech

Rep 2008 (cit on p 38)

[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)

[42] I Stephenson Production rendering design and implementation Springer 2005

(cit on p 46)

[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV

library Orsquoreilly 2008 (cit on p 50)

[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer

Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available

httpwwwsciencedirectcomsciencearticlepii016783969090011F

(cit on p 51)

[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on

p 54)

[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit

on p 54)

  • Abstract
  • Acknowledgements
  • List of Figures
  • 1 Introduction
    • 11 3D Mask Sizing project
    • 12 Objectives
    • 13 Report organization
      • 2 Literature study
        • 21 Surface reconstruction
          • 211 Stereo analysis
          • 212 Structured lighting
            • 2121 Triangulation technique
            • 2122 Pattern coding strategies
            • 2123 3D human face reconstruction
                • 22 Camera calibration
                  • 221 Definition
                  • 222 Popular techniques
                      • 3 3D face scanner application
                        • 31 Read binary file
                        • 32 Preprocessing
                          • 321 Parse XML file
                          • 322 Discard frames
                          • 323 Crop frames
                          • 324 Scale
                            • 33 Normalization
                              • 331 Normalization
                              • 332 Texture 2
                              • 333 Modulation
                              • 334 Texture 1
                                • 34 Global motion compensation
                                • 35 Decoding
                                • 36 Tessellation
                                • 37 Calibration
                                  • 371 Offline process
                                  • 372 Online process
                                    • 38 Vertex filtering
                                      • 381 Filter vertices based on decoding constraints
                                      • 382 Filter vertices outside the measurement range
                                      • 383 Filter vertices based on a maximum edge length
                                        • 39 Hole filling
                                        • 310 Smoothing
                                          • 4 Embedded system development
                                            • 41 Development tools
                                              • 411 Hardware
                                                • 4111 Single-board computer survey
                                                • 4112 BeagleBoard-xM features
                                                  • 412 Software
                                                    • 4121 Software libraries
                                                    • 4122 Software development tools
                                                        • 42 MATLAB to C code translation
                                                          • 421 Motivation for developing in C language
                                                          • 422 Translation approach
                                                            • 43 Visualization
                                                              • 5 Performance optimizations
                                                                • 51 Double to single-precision floating-point numbers
                                                                • 52 Tuned compiler flags
                                                                • 53 Modified memory layout
                                                                • 54 Reimplementation of Cs standard power function
                                                                • 55 Reduced memory accesses
                                                                • 56 GMC in y dimension only
                                                                • 57 Error in Delaunay triangulation
                                                                • 58 Modified line shifting in GMC stage
                                                                • 59 New tessellation algorithm
                                                                • 510 Modified decoding stage
                                                                • 511 Avoiding redundant calculations of column-sum vectors in the GMC stage
                                                                • 512 NEON assembly optimization 1
                                                                • 513 NEON assembly optimization 2
                                                                  • 6 Results
                                                                    • 61 MATLAB to C code translation
                                                                    • 62 Visualization
                                                                    • 63 Performance optimizations
                                                                      • 7 Conclusions
                                                                        • 71 Future work
                                                                          • Bibliography
Page 13: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 14: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 15: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 16: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 17: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 18: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 19: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 20: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 21: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 22: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 23: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 24: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 25: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 26: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 27: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 28: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 29: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 30: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 31: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 32: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 33: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 34: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 35: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 36: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 37: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 38: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 39: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 40: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 41: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 42: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 43: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 44: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 45: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 46: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 47: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 48: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 49: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 50: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 51: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 52: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 53: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 54: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 55: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 56: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 57: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 58: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 59: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 60: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 61: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 62: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 63: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 64: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 65: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 66: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 67: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 68: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 69: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 70: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 71: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 72: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 73: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 74: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 75: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 76: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 77: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 78: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 79: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 80: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a
Page 81: Eindhoven University of Technology MASTER 3D face reconstruction using structured light on a