Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Eindhoven University of Technology
MASTER
3D face reconstruction using structured light on a hand-held device
Roa Villescas M
Award date2013
Link to publication
DisclaimerThis document contains a student thesis (bachelors or masters) as authored by a student at Eindhoven University of Technology Studenttheses are made available in the TUe repository upon obtaining the required degree The grade received is not published on the documentas presented in the repository The required complexity or quality of research of student theses may vary by program and the requiredminimum study period may vary in duration
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors andor other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights
bull Users may download and print one copy of any publication from the public portal for the purpose of private study or research bull You may not further distribute the material or use it for any profit-making activity or commercial gain
Eindhoven University of Technology
Master Graduation Project
3D Face Reconstruction usingStructured Light on a Hand-held Device
Author
Martin Roa Villescas
Supervisors
Dr Ir Frank van Heesch
Prof Dr Ir Gerard de Haan
A thesis submitted in fulfilment of the requirements
for the degree of Master of Embedded Systems
in the
Smart Sensors amp Analysis Research Group
Philips Research
August 2013
EINDHOVEN UNIVERSITY OF TECHNOLOGY
Abstract
Department of Mathematics and Computer Science
Master of Embedded Systems
3D Face Reconstruction using Structured Light on a Hand-held Device
by Martin Roa Villescas
A 3D hand-held scanner using the structured lighting technique has been developed by
the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This
thesis presents an embedded implementation of such scanner A translation of the orig-
inal MATLAB implementation into C language yielded in a speedup of approximately
15 times running on a desktop computer However running the new implementation
on an embedded platform increased the time from 05 sec to more than 14 sec A wide
range of optimizations were proposed and applied to improve the performance of the
application A final execution time of 51 seconds was achieved Moreover a visual-
ization module was developed to display the reconstructed 3D models by means of the
projector contained in the embedded device
Acknowledgements
I owe a debt of gratitude to the many people who helped me during my years at TUe
First I would like to thank Frank van Heesch my supervisor at Philips an excellent
professional and even better person who showed me the way through this challenging
project while encouraging me in every step of the way He was always generous with his
time and steered me in the right direction whenever I felt I needed help He has deeply
influenced every aspect of my work
I would also like to express my sincerest gratitude to my professor Gerard de Haan the
person who was responsible for opening Philiprsquos doors to my life His achievements are a
constant source of motivation Gerard is a clear demonstration of how the collaboration
between industry and academy can produce unprecedented and magnificent results
My special thanks to all my fellow students at Philips Research who made these eight
months a wonderful time of my life Their input and advice contributed significantly
to the final result of my work In particular I would like to thank Koen de Laat for
helping me set up an automated database system to keep track of the profiling results
Furthermore I would like to thank Catalina Suarez my girlfriend for her support during
this year Your company has translated in the happiness I need to perform well in the
many aspects of my life
Finally I would like to thank my family for their permanent love and support It is hard
to find the right words to express the immense gratitude that I feel for those persons who
have given me everything so that I could be standing where I am now Mom and dad
my achievements are the result of the infinite love that you have given me throughout
my life and I will never stop feeling grateful for that
iii
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
1 Introduction 1
11 3D Mask Sizing project 3
12 Objectives 3
13 Report organization 4
2 Literature study 5
21 Surface reconstruction 5
211 Stereo analysis 6
212 Structured lighting 9
2121 Triangulation technique 10
2122 Pattern coding strategies 11
2123 3D human face reconstruction 12
22 Camera calibration 13
221 Definition 14
222 Popular techniques 14
3 3D face scanner application 17
31 Read binary file 18
32 Preprocessing 18
321 Parse XML file 18
322 Discard frames 19
323 Crop frames 19
324 Scale 19
33 Normalization 19
331 Normalization 20
332 Texture 2 21
333 Modulation 22
334 Texture 1 22
34 Global motion compensation 23
v
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Eindhoven University of Technology
Master Graduation Project
3D Face Reconstruction usingStructured Light on a Hand-held Device
Author
Martin Roa Villescas
Supervisors
Dr Ir Frank van Heesch
Prof Dr Ir Gerard de Haan
A thesis submitted in fulfilment of the requirements
for the degree of Master of Embedded Systems
in the
Smart Sensors amp Analysis Research Group
Philips Research
August 2013
EINDHOVEN UNIVERSITY OF TECHNOLOGY
Abstract
Department of Mathematics and Computer Science
Master of Embedded Systems
3D Face Reconstruction using Structured Light on a Hand-held Device
by Martin Roa Villescas
A 3D hand-held scanner using the structured lighting technique has been developed by
the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This
thesis presents an embedded implementation of such scanner A translation of the orig-
inal MATLAB implementation into C language yielded in a speedup of approximately
15 times running on a desktop computer However running the new implementation
on an embedded platform increased the time from 05 sec to more than 14 sec A wide
range of optimizations were proposed and applied to improve the performance of the
application A final execution time of 51 seconds was achieved Moreover a visual-
ization module was developed to display the reconstructed 3D models by means of the
projector contained in the embedded device
Acknowledgements
I owe a debt of gratitude to the many people who helped me during my years at TUe
First I would like to thank Frank van Heesch my supervisor at Philips an excellent
professional and even better person who showed me the way through this challenging
project while encouraging me in every step of the way He was always generous with his
time and steered me in the right direction whenever I felt I needed help He has deeply
influenced every aspect of my work
I would also like to express my sincerest gratitude to my professor Gerard de Haan the
person who was responsible for opening Philiprsquos doors to my life His achievements are a
constant source of motivation Gerard is a clear demonstration of how the collaboration
between industry and academy can produce unprecedented and magnificent results
My special thanks to all my fellow students at Philips Research who made these eight
months a wonderful time of my life Their input and advice contributed significantly
to the final result of my work In particular I would like to thank Koen de Laat for
helping me set up an automated database system to keep track of the profiling results
Furthermore I would like to thank Catalina Suarez my girlfriend for her support during
this year Your company has translated in the happiness I need to perform well in the
many aspects of my life
Finally I would like to thank my family for their permanent love and support It is hard
to find the right words to express the immense gratitude that I feel for those persons who
have given me everything so that I could be standing where I am now Mom and dad
my achievements are the result of the infinite love that you have given me throughout
my life and I will never stop feeling grateful for that
iii
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
1 Introduction 1
11 3D Mask Sizing project 3
12 Objectives 3
13 Report organization 4
2 Literature study 5
21 Surface reconstruction 5
211 Stereo analysis 6
212 Structured lighting 9
2121 Triangulation technique 10
2122 Pattern coding strategies 11
2123 3D human face reconstruction 12
22 Camera calibration 13
221 Definition 14
222 Popular techniques 14
3 3D face scanner application 17
31 Read binary file 18
32 Preprocessing 18
321 Parse XML file 18
322 Discard frames 19
323 Crop frames 19
324 Scale 19
33 Normalization 19
331 Normalization 20
332 Texture 2 21
333 Modulation 22
334 Texture 1 22
34 Global motion compensation 23
v
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
EINDHOVEN UNIVERSITY OF TECHNOLOGY
Abstract
Department of Mathematics and Computer Science
Master of Embedded Systems
3D Face Reconstruction using Structured Light on a Hand-held Device
by Martin Roa Villescas
A 3D hand-held scanner using the structured lighting technique has been developed by
the Smart Sensors amp Analysis research group (SSA) in Philips Research Eindhoven This
thesis presents an embedded implementation of such scanner A translation of the orig-
inal MATLAB implementation into C language yielded in a speedup of approximately
15 times running on a desktop computer However running the new implementation
on an embedded platform increased the time from 05 sec to more than 14 sec A wide
range of optimizations were proposed and applied to improve the performance of the
application A final execution time of 51 seconds was achieved Moreover a visual-
ization module was developed to display the reconstructed 3D models by means of the
projector contained in the embedded device
Acknowledgements
I owe a debt of gratitude to the many people who helped me during my years at TUe
First I would like to thank Frank van Heesch my supervisor at Philips an excellent
professional and even better person who showed me the way through this challenging
project while encouraging me in every step of the way He was always generous with his
time and steered me in the right direction whenever I felt I needed help He has deeply
influenced every aspect of my work
I would also like to express my sincerest gratitude to my professor Gerard de Haan the
person who was responsible for opening Philiprsquos doors to my life His achievements are a
constant source of motivation Gerard is a clear demonstration of how the collaboration
between industry and academy can produce unprecedented and magnificent results
My special thanks to all my fellow students at Philips Research who made these eight
months a wonderful time of my life Their input and advice contributed significantly
to the final result of my work In particular I would like to thank Koen de Laat for
helping me set up an automated database system to keep track of the profiling results
Furthermore I would like to thank Catalina Suarez my girlfriend for her support during
this year Your company has translated in the happiness I need to perform well in the
many aspects of my life
Finally I would like to thank my family for their permanent love and support It is hard
to find the right words to express the immense gratitude that I feel for those persons who
have given me everything so that I could be standing where I am now Mom and dad
my achievements are the result of the infinite love that you have given me throughout
my life and I will never stop feeling grateful for that
iii
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
1 Introduction 1
11 3D Mask Sizing project 3
12 Objectives 3
13 Report organization 4
2 Literature study 5
21 Surface reconstruction 5
211 Stereo analysis 6
212 Structured lighting 9
2121 Triangulation technique 10
2122 Pattern coding strategies 11
2123 3D human face reconstruction 12
22 Camera calibration 13
221 Definition 14
222 Popular techniques 14
3 3D face scanner application 17
31 Read binary file 18
32 Preprocessing 18
321 Parse XML file 18
322 Discard frames 19
323 Crop frames 19
324 Scale 19
33 Normalization 19
331 Normalization 20
332 Texture 2 21
333 Modulation 22
334 Texture 1 22
34 Global motion compensation 23
v
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Acknowledgements
I owe a debt of gratitude to the many people who helped me during my years at TUe
First I would like to thank Frank van Heesch my supervisor at Philips an excellent
professional and even better person who showed me the way through this challenging
project while encouraging me in every step of the way He was always generous with his
time and steered me in the right direction whenever I felt I needed help He has deeply
influenced every aspect of my work
I would also like to express my sincerest gratitude to my professor Gerard de Haan the
person who was responsible for opening Philiprsquos doors to my life His achievements are a
constant source of motivation Gerard is a clear demonstration of how the collaboration
between industry and academy can produce unprecedented and magnificent results
My special thanks to all my fellow students at Philips Research who made these eight
months a wonderful time of my life Their input and advice contributed significantly
to the final result of my work In particular I would like to thank Koen de Laat for
helping me set up an automated database system to keep track of the profiling results
Furthermore I would like to thank Catalina Suarez my girlfriend for her support during
this year Your company has translated in the happiness I need to perform well in the
many aspects of my life
Finally I would like to thank my family for their permanent love and support It is hard
to find the right words to express the immense gratitude that I feel for those persons who
have given me everything so that I could be standing where I am now Mom and dad
my achievements are the result of the infinite love that you have given me throughout
my life and I will never stop feeling grateful for that
iii
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
1 Introduction 1
11 3D Mask Sizing project 3
12 Objectives 3
13 Report organization 4
2 Literature study 5
21 Surface reconstruction 5
211 Stereo analysis 6
212 Structured lighting 9
2121 Triangulation technique 10
2122 Pattern coding strategies 11
2123 3D human face reconstruction 12
22 Camera calibration 13
221 Definition 14
222 Popular techniques 14
3 3D face scanner application 17
31 Read binary file 18
32 Preprocessing 18
321 Parse XML file 18
322 Discard frames 19
323 Crop frames 19
324 Scale 19
33 Normalization 19
331 Normalization 20
332 Texture 2 21
333 Modulation 22
334 Texture 1 22
34 Global motion compensation 23
v
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Contents
Abstract ii
Acknowledgements iii
List of Figures ix
1 Introduction 1
11 3D Mask Sizing project 3
12 Objectives 3
13 Report organization 4
2 Literature study 5
21 Surface reconstruction 5
211 Stereo analysis 6
212 Structured lighting 9
2121 Triangulation technique 10
2122 Pattern coding strategies 11
2123 3D human face reconstruction 12
22 Camera calibration 13
221 Definition 14
222 Popular techniques 14
3 3D face scanner application 17
31 Read binary file 18
32 Preprocessing 18
321 Parse XML file 18
322 Discard frames 19
323 Crop frames 19
324 Scale 19
33 Normalization 19
331 Normalization 20
332 Texture 2 21
333 Modulation 22
334 Texture 1 22
34 Global motion compensation 23
v
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
vi Contents
35 Decoding 24
36 Tessellation 25
37 Calibration 26
371 Offline process 27
372 Online process 27
38 Vertex filtering 28
381 Filter vertices based on decoding constraints 28
382 Filter vertices outside the measurement range 29
383 Filter vertices based on a maximum edge length 29
39 Hole filling 29
310 Smoothing 30
4 Embedded system development 31
41 Development tools 31
411 Hardware 32
4111 Single-board computer survey 32
4112 BeagleBoard-xM features 34
412 Software 34
4121 Software libraries 35
4122 Software development tools 36
42 MATLAB to C code translation 37
421 Motivation for developing in C language 37
422 Translation approach 38
43 Visualization 39
5 Performance optimizations 43
51 Double to single-precision floating-point numbers 44
52 Tuned compiler flags 44
53 Modified memory layout 45
54 Reimplementation of Crsquos standard power function 45
55 Reduced memory accesses 47
56 GMC in y dimension only 49
57 Error in Delaunay triangulation 50
58 Modified line shifting in GMC stage 50
59 New tessellation algorithm 51
510 Modified decoding stage 52
511 Avoiding redundant calculations of column-sum vectors in the GMC stage 53
512 NEON assembly optimization 1 54
513 NEON assembly optimization 2 57
6 Results 61
61 MATLAB to C code translation 61
62 Visualization 62
63 Performance optimizations 62
7 Conclusions 67
71 Future work 68
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Contents vii
Bibliography 71
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
List of Figures
11 A subset of the CPAP masks offered by Philips 2
12 A 3D hand-held scanner developed in Philips Research 4
21 Standard stereo geometry 7
22 Assumed model for triangulation as proposed in [4] 10
23 Examples of pattern coding strategies 12
24 A reference framework assumed in [25] 14
31 General flow diagram of the 3D face scanner application 17
32 Example of the 16 frames that are captured by the hand-held scanner 18
33 Flow diagram of the preprocessing stage 18
34 Flow diagram of the normalization stage 20
35 Example of the 18 frames produced in the normalization stage 21
36 Camera frame sequence in a coordinate system 22
37 Flow diagram for the calculation of the texture 1 image 22
38 Flow diagram for the global motion compensation process 23
39 Difference between pixel-based and edge-based decoding 24
310 Vertices before and after the tessellation process 25
311 The Delaunay tessellation with all the circumcircles and their centers [33] 26
312 The calibration chart 27
313 The 3D model before and after the calibration process 28
314 3D resulting models after various filtering steps 29
315 Forehead of the 3D model before and after applying the smoothing process 30
41 The BeagleBoard-xM offered by Texas instruments 35
42 Simplified diagram of the 3D face scanner application 39
43 UV coordinate system 40
44 Diagram of the visualization module 41
51 Execution times of the MATLAB and C implementations after run ondifferent platforms 44
53 Execution time before and after tuning GCCrsquos compiler options 45
54 Modification of the memory layout of the camera frames 46
55 Execution time with a different memory layout 46
56 Execution time before and after reimplementing Crsquos standard power func-tion 47
57 Order of execution before and after the optimization 48
58 Difference in execution time before and after reordering the preprocessingstage 48
ix
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
x List of Figures
59 Flow diagram for the GMC process as implemented in the MATLAB code 49
510 Difference in execution time before and after modifying the GMC stage 49
511 Execution time of the application after fixing an error in the tessellationstage 50
512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage 51
513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted 52
514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach 53
515 Execution time of the application before and after optimizing the decodingstage 54
516 Flow diagram for the optimized GMC process that avoids the recalcula-tion of the imagersquos columns sum 55
517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage 55
518 NEON SIMD architecture extension featured by Cortex-A series proces-sors along with the related terminology 56
519 Execution flow after first NEON assembly optimization 58
520 Execution times of the application before and after applying the firstNEON assembly optimization 59
521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels 59
522 Execution times of the application before and after applying the secondNEON assembly optimization 59
523 Final execution flow after second NEON assembly optimization 60
61 Execution times of the MATLAB and C implementations after run ondifferent platforms 62
62 Example of the visualization module developed 63
63 Performance evolution of the 3D face scannerrsquos C implementation 64
64 Execution times for each stage of the application 65
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Dedicated to my grandmother
xi
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
Chapter 1
Introduction
The potential of science and technology to improve every aspect of life seems to be
boundless or at least this is what the innovations of the previous centuries suggest
Among the many different interests that advocate the development of science and tech-
nology human healthcare has always been an important stimulant New technologies
are constantly being developed by leading companies all around the world to improve the
quality of peoplersquos lives A clear example is the case of the Dutch multinational Royal
Philips Electronics which devotes special interest to the development and introduction
of meaningful innovations that improve peoplersquos lives
Within the wide range of products offered by Philips there is a specific group cate-
gorized under the name of sleep solutions that aims at improving the sleep quality of
people A well-known family of products contained within this category are the so called
CPAP (Continuous Positive Airway Pressure) masks Such masks are used primarily
in the treatment of sleep apnea a sleep disorder characterized by pauses in breathing
or instances of very low breathing during sleep [1] According to a recent study con-
ducted by Philips in collaboration with the University of Twente 64 of the surveyed
population was found to suffer from this disorder [2] A total number of 4206 people
comprising women and men of different ages and levels of education took part in the
2-year study A similar survey was undertaken by the National Institutes of Health in
the United States of America [3] It reported that sleep apnea was prevalent in more
than 18 million Americans ie 662 of the countryrsquos population
While aiming to attend the large demand for CPAP masks Philips has designed and
introduced a wide variety of mask models that seek to fulfill the different needs and
constraints that arise due to several factors which include the large diversity of size
and shape of human faces inclination towards breathing through the mouth or nose
diagnosis of diseases such as sinusitis or dermatitis or disorders such as claustrophobia
1
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)
2 Chapter 1 Introduction
(a) Amara (b) ComfortClassic (c) ComfortGel Blue
(d) ComfortLite 2 (e) FitLife (f) GoLife
(g) ProfileLite Gel (h) Simplicity (i) ComfortGel
Figure 11 A subset of the CPAP masks offered by Philips
amongst others A subset of these models is shown in Figure 11 It is important to
mention that a poor selection of a CPAP mask might cause undesirable side effects to the
patient such as marks or even pressure ulcers Consequently the physical dimensions
of each patientrsquos face play a crucial role in the selection of the most appropriate CPAP
mask
Unfortunately the current practices used to assess the adequacy of CPAP masks based
on facial dimensions are quite error prone They rely on trial-and-error procedures in
which the patient tries on different mask models and selects the one he thinks is the
most comfortable In order to alleviate this problem Philips Research launched the
3D Mask Sizing project which aims to develop an automated embedded system capable
Chapter 1 Introduction 3
of assisting sleep technicians in prescribing the most appropriate CPAP mask for each
patient
11 3D Mask Sizing project
The 3D Mask Sizing project is based on the initiative of Philips to develop some techno-
logical means that can assist sleep technicians in the selection of a proper CPAP mask
model for each patient A series of algorithms methods and hardware prototypes are the
result of several years of research carried out by the Smart Sensing amp Analysis research
group in Philips Research Eindhoven The resulting automated mask advising system
comprises four main parts
1 An accurate 3D model reconstruction of the patentrsquos face dimensions and geometry
2 The extraction of facial landmarks from the reconstructed model by means of
computer vision algorithms
3 The actual fit quality assessment by virtually fitting a series of 3D mask models
to the reconstructed face
4 The creation of a custom cushion that optimizes for uniform pressure along the
cushion contour
The focus of this thesis project is based on the first step
As part of the progress made in the 3D Mask Sizing project at Philips Research Eind-
hoven a first prototype of a 3D hand-held scanner using the structured lighting technique
was already developed and is the base for the present project Figure 12a shows the
hardware setup of such device In short this scanner is capable of capturing a picture
sequence of a patientrsquos face while illuminating it with specific structured light patterns
Such picture sequence is processed by means of a series of algorithms in order to re-
construct a 3D model of the face An example of a resulting 3D model is presented in
Figure 12b The reconstruction process and all other calculations are currently being
performed offline and are mostly implemented in MATLAB
12 Objectives
The main objective of this thesis project is to extend the functionality of the mentioned
scanner such that the 3D reconstruction is computed locally on the embedded platform
This implies transforming the already developed methods and algorithms in such a
4 Chapter 1 Introduction
(a) Hardware (b) 3D model example
Figure 12 A 3D hand-held scanner developed in Philips Research
way that extra-functional requirements are taken into account These extra-functional
requirements involve an optimal use of the available computational resources Highest
priority should be given to the execution time of the application Specifically the 3D
reconstruction should be running on the embedded device in less than 5 seconds on
average Because the embedded processor contained in the final product will be similar
to an ARMrsquos Cortex-A8 the new implementation should be targeted to this processor
in particular by making proper use of the specific features it provides Moreover the
visualization of the reconstructed face model should be made possible by means of the
embedded projector contained in the device
13 Report organization
This report is organized as follows Chapter 2 presents the basic principles that underlay
different technologies for surface reconstruction placing special emphasis on structured
lighting techniques In Chapter 3 an overview of the 3D face scanner application is
provided which functions as the starting point for the current project Chapter 4
details the most relevant aspects that pertain to the implementation of the 3D face
scanner application on an embedded device In Chapter 5 a series of optimizations
used to reduce the execution time of the application are described Chapter 6 highlights
the most important results of the development process namely the MATLAB to C
translation the visualization module and the set of optimizations Finally Chapter 7
concludes the thesis while delineating paths for further improvements of the presented
work
(C) 2012 Alexander Grahn 3Dmenujs version 20120912 3D JavaScript used by media9sty Extended functionality of the (right click) context menu of 3D annotations 1) Adds the following items to the 3D context menu `Generate Default View Finds good default camera settings returned as options for use with the includemedia command `Get Current View Determines camera cross section and part settings of the current view returned as `VIEW section that can be copied into a views file of additional views The views file is inserted using the `3Dviews option of includemedia `Cross Section Toggle switch to add or remove a cross section into or from the current view The cross section can be moved in the x y z directions using x y z and X Y Z keys on the keyboard and be tilted against and spun around the upright Z axis using the UpDown and LeftRight arrow keys 2) Enables manipulation of position and orientation of indiviual parts in the 3D scene Parts which have been selected with the mouse can be moved around and rotated like the cross section as described above as well as scaled using the s and S keys This work may be distributed andor modified under the conditions of the LaTeX Project Public License either version 13 of this license or (at your option) any later version The latest version of this license is in httpwwwlatex-projectorglppltxt and version 13 or later is part of all distributions of LaTeX version 20051201 or later This work has the LPPL maintenance status `maintained The Current Maintainer of this work is A Grahn The code borrows heavily from Bernd Gaertners `Miniball software originally written in C++ for computing the smallest enclosing ball of a set of points see httpwwwinfethzchpersonalgaertnerminiballhtmlhostconsoleshow()constructor for doubly linked listfunction List() thisfirst_node=null thislast_node=new Node(undefined)Listprototypepush_back=function(x) var new_node=new Node(x) if(thisfirst_node==null) thisfirst_node=new_node new_nodeprev=null else new_nodeprev=thislast_nodeprev new_nodeprevnext=new_node new_nodenext=thislast_node thislast_nodeprev=new_nodeListprototypemove_to_front=function(it) var node=itget() if(nodenext=null ampamp nodeprev=null) nodenextprev=nodeprev nodeprevnext=nodenext nodeprev=null nodenext=thisfirst_node thisfirst_nodeprev=node thisfirst_node=node Listprototypebegin=function() var i=new Iterator() itarget=thisfirst_node return(i)Listprototypeend=function() var i=new Iterator() itarget=thislast_node return(i)function Iterator(it) if( it=undefined ) thistarget=ittarget else thistarget=null Iteratorprototypeset=function(it)thistarget=ittargetIteratorprototypeget=function()return(thistarget)Iteratorprototypederef=function()return(thistargetdata)Iteratorprototypeincr=function() if(thistargetnext=null) thistarget=thistargetnextconstructor for node objects that populate the linked listfunction Node(x) thisprev=null thisnext=null thisdata=xfunction sqr(r)return(rr)helper functionMiniball algorithm by B Gaertnerfunction Basis() thism=0 thisq0=new Array(3) thisz=new Array(4) thisf=new Array(4) thisv=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisa=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thisc=new Array(new Array(3) new Array(3) new Array(3) new Array(3)) thissqr_r=new Array(4) thiscurrent_c=thisc[0] thiscurrent_sqr_r=0 thisreset()Basisprototypecenter=function()return(thiscurrent_c)Basisprototypesize=function()return(thism)Basisprototypepop=function()--thismBasisprototypeexcess=function(p) var e=-thiscurrent_sqr_r for(var k=0klt3++k) e+=sqr(p[k]-thiscurrent_c[k]) return(e)Basisprototypereset=function() thism=0 for(var j=0jlt3++j) thisc[0][j]=0 thiscurrent_c=thisc[0] thiscurrent_sqr_r=-1Basisprototypepush=function(p) var i j var eps=1e-32 if(thism==0) for(i=0ilt3++i) thisq0[i]=p[i] for(i=0ilt3++i) thisc[0][i]=thisq0[i] thissqr_r[0]=0 else for(i=0ilt3++i) thisv[thism][i]=p[i]-thisq0[i] for(i=1iltthism++i) thisa[thism][i]=0 for(j=0jlt3++j) thisa[thism][i]+=thisv[i][j]thisv[thism][j] thisa[thism][i]=(2thisz[i]) for(i=1iltthism++i) for(j=0jlt3++j) thisv[thism][j]-=thisa[thism][i]thisv[i][j] thisz[thism]=0 for(j=0jlt3++j) thisz[thism]+=sqr(thisv[thism][j]) thisz[thism]=2 if(thisz[thism]ltepsthiscurrent_sqr_r) return(false) var e=-thissqr_r[thism-1] for(i=0ilt3++i) e+=sqr(p[i]-thisc[thism-1][i]) thisf[thism]=ethisz[thism] for(i=0ilt3++i) thisc[thism][i]=thisc[thism-1][i]+thisf[thism]thisv[thism][i] thissqr_r[thism]=thissqr_r[thism-1]+ethisf[thism]2 thiscurrent_c=thisc[thism] thiscurrent_sqr_r=thissqr_r[thism] ++thism return(true)function Miniball() thisL=new List() thisB=new Basis() thissupport_end=new Iterator()Miniballprototypemtf_mb=function(it) var i=new Iterator(it) thissupport_endset(thisLbegin()) if((thisBsize())==4) return for(var k=new Iterator(thisLbegin())kget()=iget()) var j=new Iterator(k) kincr() if(thisBexcess(jderef()) gt 0) if(thisBpush(jderef())) thismtf_mb(j) thisBpop() if(thissupport_endget()==jget()) thissupport_endincr() thisLmove_to_front(j) Miniballprototypecheck_in=function(b) thisLpush_back(b)Miniballprototypebuild=function() thisBreset() thissupport_endset(thisLbegin()) thismtf_mb(thisLend())Miniballprototypecenter=function() return(thisBcenter())Miniballprototyperadius=function() return(Mathsqrt(thisBcurrent_sqr_r))functions called by menu itemsfunction calc3Dopts () create Miniball object var mb=new Miniball() auxiliary vector var corner=new Vector3() iterate over all visible mesh nodes in the scene for(i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) if(meshvisible) continue local to parent transformation matrix var trans=meshtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=meshparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the bbox of the mesh (local coordinates) var bbox=meshcomputeBoundingBox() transform the local bounding box corner coordinates to world coordinates for bounding sphere determination BBoxmin cornerset(bboxmin) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) BBoxmax cornerset(bboxmax) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) remaining six BBox corners cornerset(bboxminx bboxmaxy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxminx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxminy bboxmaxz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) cornerset(bboxmaxx bboxmaxy bboxminz) cornerset(transtransformPosition(corner)) mbcheck_in(new Array(cornerx cornery cornerz)) compute the smallest enclosing bounding sphere mbbuild() current camera settings var camera=scenecamerasgetByIndex(0) var res= initialize result string aperture angle of the virtual camera (perspective projection) or orthographic scale (orthographic projection) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf(n3Daac=s aac) else cameraviewPlaneSize=2mbradius() res+=hostutilprintf(n3Dortho=s 1cameraviewPlaneSize) camera roll var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf(n3Droll=sroll) target to camera vector var c2c=new Vector3() c2cset(cameraposition) c2csubtractInPlace(cameratargetPosition) c2cnormalize() if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf(n3Dc2c=s s s c2cx c2cy c2cz) new camera settings bounding sphere centre --gt new camera target var coo=new Vector3() cooset((mbcenter())[0] (mbcenter())[1] (mbcenter())[2]) if(coolength) res+=hostutilprintf(n3Dcoo=s s s coox cooy cooz) radius of orbit if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var roo=mbradius() Mathsin(aac MathPI 360) else orthographic projection var roo=mbradius() res+=hostutilprintf(n3Droo=s roo) update camera settings in the viewer var currol=cameraroll cameratargetPositionset(coo) camerapositionset(cooadd(c2cscale(roo))) cameraroll=currol determine background colour rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf(n3Dbg=s s s rgbr rgbg rgbb) determine lighting scheme switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+=hostutilprintf(n3Dlights=s curlights) determine global render mode switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak if(currender=Solid) res+=hostutilprintf(n3Drender=s currender) write result string to the console hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Copy and paste the following text to then+ option list of includemedian + res + n)function get3Dview () var camera=scenecamerasgetByIndex(0) var coo=cameratargetPosition var c2c=camerapositionsubtract(coo) var roo=c2clength c2cnormalize() var res=VIEW=insert optional name heren if((coox==0 ampamp cooy==0 ampamp cooz==0)) res+=hostutilprintf( COO=s s sn coox cooy cooz) if((c2cx==0 ampamp c2cy==-1 ampamp c2cz==0)) res+=hostutilprintf( C2C=s s sn c2cx c2cy c2cz) if(roo gt 1e-9) res+=hostutilprintf( ROO=sn roo) var roll = cameraroll180MathPI if(hostutilprintf(4f roll)=0) res+=hostutilprintf( ROLL=sn roll) if(cameraprojectionType==cameraTYPE_PERSPECTIVE) var aac=camerafov 180MathPI if(hostutilprintf(4f aac)=30) res+=hostutilprintf( AAC=sn aac) else if(hostutilprintf(4f cameraviewPlaneSize)=1) res+=hostutilprintf( ORTHO=sn 1cameraviewPlaneSize) rgb=scenebackgroundgetColor() if((rgbr==1 ampamp rgbg==1 ampamp rgbb==1)) res+=hostutilprintf( BGCOLOR=s s sn rgbr rgbg rgbb) switch(scenelightScheme) case sceneLIGHT_MODE_FILE curlights=Artworkbreak case sceneLIGHT_MODE_NONE curlights=Nonebreak case sceneLIGHT_MODE_WHITE curlights=Whitebreak case sceneLIGHT_MODE_DAY curlights=Daybreak case sceneLIGHT_MODE_NIGHT curlights=Nightbreak case sceneLIGHT_MODE_BRIGHT curlights=Hardbreak case sceneLIGHT_MODE_RGB curlights=Primarybreak case sceneLIGHT_MODE_BLUE curlights=Bluebreak case sceneLIGHT_MODE_RED curlights=Redbreak case sceneLIGHT_MODE_CUBE curlights=Cubebreak case sceneLIGHT_MODE_CAD curlights=CADbreak case sceneLIGHT_MODE_HEADLAMP curlights=Headlampbreak if(curlights=Artwork) res+= LIGHTS=+curlights+n switch(scenerenderMode) case sceneRENDER_MODE_BOUNDING_BOX defaultrender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX defaultrender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE defaultrender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES defaultrender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES defaultrender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME defaultrender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME defaultrender=ShadedWireframebreak case sceneRENDER_MODE_SOLID defaultrender=Solidbreak case sceneRENDER_MODE_TRANSPARENT defaultrender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME defaultrender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME defaultrender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION defaultrender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE defaultrender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION defaultrender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME defaultrender=HiddenWireframebreak if(defaultrender=Solid) res+= RENDERMODE=+defaultrender+n for(var i=0iltscenemeshescounti++) var mesh=scenemeshesgetByIndex(i) var meshUTFName = for (var j=0 jltmeshnamelength j++) var theUnicode = meshnamecharCodeAt(j)toString(16) while (theUnicodelengthlt4) theUnicode = 0 + theUnicode meshUTFName += theUnicode var end=meshnamelastIndexOf() if(endgt0) var meshUserName=meshnamesubstr(0end) else var meshUserName=meshname respart= PART=+meshUserName+n respart+= UTF16NAME=+meshUTFName+n defaultvals=true if(meshvisible) respart+= VISIBLE=falsen defaultvals=false if(meshopacitylt10) respart+= OPACITY=+meshopacity+n defaultvals=false currender=defaultrender switch(meshrenderMode) case sceneRENDER_MODE_BOUNDING_BOX currender=BoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX currender=TransparentBoundingBoxbreak case sceneRENDER_MODE_TRANSPARENT_BOUNDING_BOX_OUTLINE currender=TransparentBoundingBoxOutlinebreak case sceneRENDER_MODE_VERTICES currender=Verticesbreak case sceneRENDER_MODE_SHADED_VERTICES currender=ShadedVerticesbreak case sceneRENDER_MODE_WIREFRAME currender=Wireframebreak case sceneRENDER_MODE_SHADED_WIREFRAME currender=ShadedWireframebreak case sceneRENDER_MODE_SOLID currender=Solidbreak case sceneRENDER_MODE_TRANSPARENT currender=Transparentbreak case sceneRENDER_MODE_SOLID_WIREFRAME currender=SolidWireframebreak case sceneRENDER_MODE_TRANSPARENT_WIREFRAME currender=TransparentWireframebreak case sceneRENDER_MODE_ILLUSTRATION currender=Illustrationbreak case sceneRENDER_MODE_SOLID_OUTLINE currender=SolidOutlinebreak case sceneRENDER_MODE_SHADED_ILLUSTRATION currender=ShadedIllustrationbreak case sceneRENDER_MODE_HIDDEN_WIREFRAME currender=HiddenWireframebreak case sceneRENDER_MODE_DEFAULT currender=Defaultbreak if(currender=defaultrender) respart+= RENDERMODE=+currender+n defaultvals=false if(meshtransformisEqual(origtrans[meshname])) var lvec=meshtransformtransformDirection(new Vector3(100)) var uvec=meshtransformtransformDirection(new Vector3(010)) var vvec=meshtransformtransformDirection(new Vector3(001)) respart+= TRANSFORM= +lvecx+ +lvecy+ +lvecz+ +uvecx+ +uvecy+ +uvecz+ +vvecx+ +vvecy+ +vvecz+ +meshtransformtranslationx+ +meshtransformtranslationy+ +meshtransformtranslationz+n defaultvals=false respart+= ENDn if(defaultvals) res+=respart detect existing Clipping Plane (3D Cross Section) var clip=null for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) clip=scenenodesgetByIndex(i) if(clip) var centre=cliptransformtranslation var normal=cliptransformtransformDirection(new Vector3(001)) res+= CROSSSECTn if((centrex==0 ampamp centrey==0 ampamp centrez==0)) res+=hostutilprintf( CENTER=s s sn centrex centrey centrez) if((normalx==1 ampamp normaly==0 ampamp normalz==0)) res+=hostutilprintf( NORMAL=s s sn normalx normaly normalz) res+= ENDn res+=ENDn hostconsoleshow() hostconsoleclear() hostconsoleprintln(n Add the following VIEW section to a file ofn+ predefined views (See option 3Dviews)nn + The view may be given a name after VIEW=n + (Remove in front of =)n) hostconsoleprintln(res + n)add items to 3D context menuruntimeaddCustomMenuItem(dfltview Generate Default View default 0)runtimeaddCustomMenuItem(currview Get Current View default 0)runtimeaddCustomMenuItem(csection Cross Section checked 0)menu event handlersmenuEventHandler = new MenuEventHandler()menuEventHandleronEvent = function(e) switch(emenuItemName) case dfltview calc3Dopts() break case currview get3Dview() break case csection addremoveClipPlane(emenuItemChecked) break runtimeaddEventHandler(menuEventHandler)global variable taking reference to currently selected mesh nodevar mshSelected=nullselectionEventHandler=new SelectionEventHandler()selectionEventHandleronEvent=function(e) if(eselected ampamp enodeconstructorname==Mesh) mshSelected=enode else mshSelected=null runtimeaddEventHandler(selectionEventHandler)cameraEventHandler=new CameraEventHandler()cameraEventHandleronEvent=function(e) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 0) for(i=0 iltscenenodescount i++) if( scenenodesgetByIndex(i)name == $$$$$$ || scenenodesgetByIndex(i)name == Clipping Plane ) runtimeremoveCustomMenuItem(csection) runtimeaddCustomMenuItem(csection Cross Section checked 1) runtimeaddEventHandler(cameraEventHandler)key event handler for moving spinning and tilting objectskeyEventHandler=new KeyEventHandler()keyEventHandleronEvent=function(e) var target=null var backtrans=new Matrix4x4() if(mshSelected) target=mshSelected var trans=targettransform var parent=targetparent while(parenttransform) build local to world transformation matrix transmultiplyInPlace(parenttransform) also build world to local back-transformation matrix backtransmultiplyInPlace(parenttransforminversetranspose) parent=parentparent backtranstransposeInPlace() else try target=scenenodesgetByName(Clipping Plane) catch(e) var ndcnt=scenenodescount target=scenecreateClippingPlane() if(ndcnt=scenenodescount) targetremove() target=null if(target) return switch(echaracterCode) case 30tilt up tiltTarget(target -MathPI900) break case 31tilt down tiltTarget(target MathPI900) break case 28spin right spinTarget(target -MathPI900) break case 29spin left spinTarget(target MathPI900) break case 120 x translateTarget(target new Vector3(100) e) break case 121 y translateTarget(target new Vector3(010) e) break case 122 z translateTarget(target new Vector3(001) e) break case 88 shift + x translateTarget(target new Vector3(-100) e) break case 89 shift + y translateTarget(target new Vector3(0-10) e) break case 90 shift + z translateTarget(target new Vector3(00-1) e) break case 115 s scaleTarget(target 1 e) break case 83 shift + s scaleTarget(target -1 e) break if(mshSelected) targettransformmultiplyInPlace(backtrans)runtimeaddEventHandler(keyEventHandler)function tiltTarget(ta) var centre=new Vector3() if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) else centreset(ttransformtranslation) var rotVec=ttransformtransformDirection(new Vector3(010)) rotVecnormalize() ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)function spinTarget(ta) var centre=new Vector3() var rotVec=new Vector3(001) if(mshSelected) centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) rotVecset(ttransformtransformDirection(rotVec)) rotVecnormalize() else centreset(ttransformtranslation) ttransformtranslateInPlace(centrescale(-1)) ttransformrotateAboutVectorInPlace(a rotVec) ttransformtranslateInPlace(centre)translates object by amount calculated based on Canvas sizefunction translateTarget(t d e) var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 Mathmin(ecanvasPixelWidthecanvasPixelHeight) ttransformtranslateInPlace(dscale(scale))scales object by amount calculated based on Canvas sizefunction scaleTarget(t d e) if(mshSelected) var bbox=tcomputeBoundingBox() var diag=new Vector3(bboxmaxx bboxmaxy bboxmaxz) diagsubtractInPlace(bboxmin) var dlen=diaglength var cam=scenecamerasgetByIndex(0) if(camprojectionType==camTYPE_PERSPECTIVE) var scale=Mathtan(camfov2) camtargetPositionsubtract(camposition)length dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) else var scale=camviewPlaneSize2 dlen Mathmin(ecanvasPixelWidthecanvasPixelHeight) var centre=new Vector3() centreset(ttransformtransformPosition(tcomputeBoundingBox()center)) ttransformtranslateInPlace(centrescale(-1)) ttransformscaleInPlace(1+dscale) ttransformtranslateInPlace(centre) function addremoveClipPlane(chk) var clip=scenecreateClippingPlane() if(chk) add Clipping Plane and place its center either into the camera target position or into the centre of the currently selected mesh node var centre=new Vector3() if(mshSelected) local to parent transformation matrix var trans=mshSelectedtransform build local to world transformation matrix by recursively multiplying the parents transf matrix on the right var parent=mshSelectedparent while(parenttransform) trans=transmultiply(parenttransform) parent=parentparent get the centre of the mesh (local coordinates) centreset(mshSelectedcomputeBoundingBox()center) transform the local coordinates to world coords centreset(transtransformPosition(centre)) mshSelected=null else centreset(scenecamerasgetByIndex(0)targetPosition) cliptransformsetView( new Vector3(000) new Vector3(100) new Vector3(010)) cliptransformtranslateInPlace(centre) else clipremove() function to store current transformation matrix of all mesh nodes in the scenefunction getCurTrans() var nc=scenemeshescount var tA=new Array(nc) for(var i=0 iltnc i++) var cm=scenemeshesgetByIndex(i) tA[cmname]=new Matrix4x4(cmtransform) return tAfunction to restore transformation matrices given as argfunction restoreTrans(tA) for(var i=0 ilttAlength i++) var msh=scenemeshesgetByIndex(i) mshtransformset(tA[mshname]) store original transformation matrix of all mesh nodes in the scenevar origtrans=getCurTrans()set initial state of Cross Section menu entrycameraEventHandleronEvent(1)hostconsoleclear()
Chapter 2
Literature study
This chapter presents a selective analysis of the state-of-the-art in the field of surface
reconstruction placing special emphasis on structured lighting techniques A brief
overview of the three main underlying technologies used for depth estimation is pre-
sented first This is followed by an example of stereo analysis which serves as the basis
for the more specific structured lighting techniques Moreover this example helps to
illustrate why stereo analysis is considered less preferable for 3D face reconstruction
applications when compared with the structured lighting techniques Special emphasis
is placed on the scientific principles underlying structured lighting techniques Further-
more a classification of the different types of pattern coding strategies available in the
literature is given along with an analysis of their suitability for our application Fi-
nally the chapter concludes with a brief discussion of camera calibration and its most
representative techniques
21 Surface reconstruction
Surface reconstruction has a wide range of practical applications such as computer mod-
eling of 3D objects (such as those found in areas like architecture mechanical engi-
neering or surgery) distance measurements for vehicle control surface inspections for
quality control approximate or exact estimates of the location of 3D objects for auto-
mated assembly and fast location of obstacles for efficient navigation [4]
Technologies for surface reconstruction include contact and non-contact techniques the
latter being our principal interest Non-contact techniques may be further categorized
as echo-metric reflecto-metric and stereo-metric as proposed in [5] Echo-metric tech-
niques use time-of-flight measurements to determine the distance to an object ie they
5
6 Chapter 2 Literature study
are based on the time it takes for a wave (acoustic micro electromagnetic) to reflect
from an objectrsquos surface through a given medium Reflecto-metric techniques process
one or more images of the object to determine its surface orientation and consequently
its shape Finally stereo-metric techniques determine the location of the objectrsquos surface
by triangulating each point with its corresponding projections in two or more images
Echo-metric techniques suffer from a number of drawbacks Systems employing such
techniques are heavily affected by environmental parameters such as temperature and
humidity [6] These parameters affect the velocity at which waves travels through a
given medium thus introducing errors in depth measurement On the other hand
both reflecto-metric and stereo-metric techniques are less affected by environmental
parameters However reflecto-metric techniques entail a major difficulty ie they
require an estimation of the model of the environment In the remaining of this section
we will limit the discussion to the stereo-metric category and focus on the structured
lighting techniques
211 Stereo analysis
Considering that surface reconstruction by means of structured lighting can be regarded
as an extension of the more general stereo-vision technique an introductory example of
stereo analysis is presented in this section This example intends to show why the use
of structured lighting becomes essential for our application This example is presented
in [4]
Surface reconstruction can be achieved by means of the visual disparity that results
when an object is observed from different camera viewpoints In its simplest form two
cameras can be used for this purpose Triangulation between a point in the object and
its respective projection in each of the camera projection planes can be used to calculate
the depth at which this point lies from a certain reference Note however that in order
to calculate the triangulation more parameters are required These parameters refer for
example to the distance at which the cameras are located from one another (extrinsic
parameter) or to the focal length of each of the cameras (intrinsic parameter)
Figure 21 illustrates the so-called standard stereo geometry [4] of two cameras In this
model the origin of the XYZ-coordinate system O = (0 0 0) is located at the focal
point of the left camera The focal point of the right camera lies at a distance b along
the X-axis from the left camera ie at the point (b 0 0) Both cameras are assumed
to have the same focal length f As a consequence the images of both cameras are
located in the same image plane The Z-axis coincides with the optical axis of the
left camera Moreover the optical axes of both cameras are parallel to each other and
Chapter 2 Literature study 7
oriented towards the scene objects Also note that because the x-axes of both images
are identically oriented rows with same row-number in the two different images lie on
the same straight line
optical axis of right camera
left image right image(XYZ)
row y row y
base distance b
optical axis of left camera
leftx rightx
Figure 21 Standard stereo geometry
In this model a scene point P = (XY Z) is projected onto two corresponding image
points
pleft = (xleft yleft) and pright = (xright yright)
in the left and right images respectively assuming that the scene point is visible from
both camera viewpoints The disparity with respect to pleft is a vector given by
∆(xleft yleft) = (xleft minus xright yleft minus yright)T (21)
between two corresponding image points
In the standard stereo geometry pinhole camera models are used to represent the con-
sidered cameras The basic idea of a pinhole camera is that it projects scene points P
onto image points p according to a central projection given by
p = (x y) =
(f middotXZ
f middot YZ
)(22)
assuming that Z gt f
According to the ideal assumptions considered in the standard stereo geometry of the
two cameras it holds that y = yleft = yright Therefore for the left camera the cen-
tral projection equation is given directly by Equation 22 considering that the pinhole
camera model assumes that the Z-axis is identified to be the optical axis of the camera
Furthermore given the displacement of the right camera by b along the X axis the
8 Chapter 2 Literature study
central projection equation is given by
(xright y) =
(f middot (X minus b)
Zf middot YZ
)
Rather than calculating a disparity vector given by Equation 21 for all corresponding
pairs of points in the different images the scalar disparity proves to be sufficient under
the assumptions made in the standard stereo geometry The scalar disparity of two
corresponding points in each one of the images with respect to pleft is given by
∆ssg(xleft yleft) =radic
(xleft minus xright)2 + (yleft minus yright)2
However because rows with same row numbers in the two images have the same y value
the scalar disparity of a pair of corresponding points reduces to
∆ssg(xleft yleft) = |xleft minus xright| = xleft minus xright (23)
Note that it is valid to remove the absolute value operator because of the chosen arrange-
ment of the cameras A disparity map ∆(x y) is defined by applying equation 23 to all
corresponding points in the two images For those points that could not be associated
with a correspondent point in the other image (for example because of occlusion) the
value ldquoundefinedrdquo is recorded
Finally in order to come up with the equations that determine the 3D location of each
point in the scene note that from the two central projection equations of the two cameras
it follows that
Z =f middotXxleft
=f middot (X minus b)xright
and therefore
X =b middot xleft
xleft minus xright
Using the previous equation it follows that
Z =b middot f
xleft minus xright
By substituting this result into the projection equation for y it follows that
Y =b middot y
xleft minus xright
The last three equations allow the reconstruction of the coordinates of the projected
points P within the three-dimensional XYZ-space assuming that the parameters f and
Chapter 2 Literature study 9
b are known and that the disparity map ∆(x y) was measured for each pair of corre-
sponding points in the two images Note that a variety of methods exists to calibrate
different types of camera configuration systems ie to determine their intrinsic and ex-
trinsic parameters More on these calibration procedures is further discussed in Section
22
The process of determining corresponding point pairs is known as the correspondence
problem A wide variety of techniques are used to solve the correspondence problem in
stereo image analysis Such techniques generally involve the extraction and matching
of features between two or more images These features are typically corners or edges
contained within the images Although these techniques are found to be appropriate for
a certain number of applications it turns out that they present a number of drawbacks
that make their applicability unfeasible for many others The main drawbacks are (i)
feature extraction and matching is generally computationally expensive (ii) features
might not be available depending on the nature of the environment or the placement
of the cameras and (iii) low lighting conditions generally increase the complexity of the
matching procedure thus making the system more error prone Such problems in solving
the correspondence problem can generally be overcome by resorting to a different but
similar type of techniques known by the name of structured lighting techniques While
structured lighting techniques involve a complete different methodology on how to solve
the correspondence problem they share large part of the theory presented in this section
regarding the depth reconstruction process
212 Structured lighting
Structured lighting methods can be thought of as a modification of the previously de-
scribed stereo analysis approach where one of the cameras is replaced by a light source
which projects a light pattern actively into the scene The location of an object in space
can then be determined by analyzing the deformation of the projected light pattern
The idea behind this modification is to simplify the complexity of the correspondence
analysis by actively manipulating the scene
It is important to note that stereoscopic based systems do not assume complex require-
ments for image acquisition since they mostly rely on theoretical mathematical and
algorithmic analyses to solve the reconstruction problem On the other hand the idea
behind structured lighting methods is to shift this complexity to another level such as
the engineering prerequisites of the overall system [4]
A wide variety of light patterns have been proposed by the research community [5] [7]ndash
[17] Their aim is to reduce the large number of images that would have to be captured
10 Chapter 2 Literature study
when using the most basic of all approaches ie a light spot In Section 2122 a
classification of the encoded patterns available is presented Nevertheless the light spot
projection technique serves as a solid starting point to introduce the main principle
underlying the depth recovery of most other encoded light patterns the triangulation
technique
2121 Triangulation technique
Triangulation refers to the process of determining the location of a point by measuring
angles formed from it to points at either end of a fixed baseline Various approaches
have been proposed for accomplishing this task An early analysis was described by Hall
et al [18] in 1982 Klette also presented his own analysis in [4] In the following an
overview of Klettersquos triangulation approach is explained
Figure 22 shows the simplified model that Klette assumes in his analysis Note that the
object
P
base distance bcamera light source
Z
XL
β
γ
α
h
O
d
Figure 22 Assumed model for triangulation as proposed in [4]
system can be thought of as a 2D object scene ie it has no vertical dimension As a
consequence the object light source and camera all lie in the same plane The angles
α and β are given by the calibration As in the previous example the base distance b
is assumed to be known and the origin of the coordinate system O coincides with the
projection center of the camera
Chapter 2 Literature study 11
The goal is to calculate the distance d between the origin O and the object point
P = (X0 Z0) This can be done using the law of sines as follows
d
sin(α)=
b
sin(γ)
From γ = π minus (α+ β) and sin(π minus γ) = sin(γ) it holds that
d
sin(α)=
b
sin(π minus γ)=
b
sin(α+ β)
Therefore distance d is given by
d =b middot sin(α)
sin(α+ β)
which holds for any point P lying on the surface of the object
2122 Pattern coding strategies
As stated earlier there is a wide variety of pattern coding strategies available in the lit-
erature that aim to fulfill all requirements found in different scenarios and applications
In coded structure light systems every coded pixel in the pattern has its own codeword
that allows direct mapping ie every codeword is mapped to the corresponding coordi-
nates of a given pixel or group of pixels in the pattern A codeword can be represented
using grey levels colors or even geometrical characteristics The following classification
of pattern coding strategies was proposed by Salvi et al in [19]
bull Time-multiplexing This is one of the most commonly used strategies The
idea is to project a set of patterns onto the scene one after the other The
sequence of illuminated values determines the codeword for each pixel The main
advantage of this kind of pattern is that it can achieve high spatial resolution in
the measurements However its accuracy is highly sensible to movement of either
the structured light system or objects in the scene during the time period when the
acquisition process takes place Previous research in this area includes the work of
[5] [7] [8] An example of this coding strategy is the binary coded pattern shown
in Figure 23a
bull Spatial Neighborhood In this strategy the codeword that is assigned to a given
pixel depends on its neighborhood Codification is done on the basis of intensity
[9]ndash[11] color [12] or a unique structure of the neighborhood [13] In contrast with
time-multiplexing strategies spatial neighborhood strategies allow for all coding
information to be condensed into a single projection pattern making them highly
12 Chapter 2 Literature study
suitable for applications that involve timing constraints such as autonomous nav-
igation The compromise however is deterioration in spatial resolution Figure
23b is an example of this strategy proposed by Griffin et al [14]
bull Direct coding In direct coding strategies every pixel in the pattern is labeled
by the information it represents In other words the entire codeword for a given
point is contained in a unique pixel as explained in [19] Basically there are two
ways to achieve this either by using a large range of color values [15] [16] or
by introducing periodicity [17] Although in theory this group of strategies can
be used to reconstruct objects with high resolution a major problem occurs in
practice the colors imaged by camera(s) of the system do not only depend on the
projected colors but also on the intrinsic colors of the measuring surface and light
source The consequence is that reference images become necessary Figure 23c
shows an example of a direct coding strategy proposed in [16]
(a) Time-multiplexing
In 1993 Hung(67) proposed a grey level sinusoidalpattern The period of the captured pattern dependson the depth of the surface where it is reflected How-ever computation demands much time Hung pro-posed as Wust and Capson had also proposed totriangulate from the column phase of the imagedpoint For each pixel point this phase can be approx-imately obtained from the light intensity Obviouslythis method suffers the same limitation as the methodproposed by Wust et al
69 GriffinmdashNarasimhanmdashfrac12ee
Griffin et al(68) in 1992 have carried out a math-ematical study about which should be the largest sizeallowed for a coded matrix dot pattern It is supposedthat (1) A dot position is coded with informationemitted by itself and the information of its four neigh-bours (North South East and West) (2) There cannotbe two different dot positions with the same code (3)The information is determined using a fixed basiswhich determines the symbols used to code thematrix (4) The biggest matrix is desired ie the matrixwhich gives a better resolution
If a basis equal to 3 is done a possible dot codifica-tion is shown in Fig 21
Griffin et al have proved that given a basis b thelargest matrix (the biggest n]m matrix) can be ob-tained from its largest horizontal vector (Vhm) and itslargest vertical vector (Vvm) Vhm is a vector made bythe sequence of all the triplets of numbers that can bemade without repetition using a b basis Vvm is a vec-tor made by the sequence of all the pairs of numbersthat can be made without repetition Then the firstrow of the matrix is given directly by Vhm
f0iVhm
i(50)
and the other matrix elements can be determinedapplying equation (51) where lsquolsquoirsquorsquo is the row index andvaries from 0 to the Vhm length and lsquolsquojrsquorsquo the columnindex and varies from 0 to the Vvm length
fij1(( f
i~1jVvm
j)mod b) (51)
For example if a basis equal to 3 is supposed thenits largest vectors are
Vhm(33132131123122121113323222333)
Vvm(3121132233)
Fig 21 Dot codification example using its four neighboursand a basis equal to 3 ie only three different symbols can be
used
Fig 22 A possible coded dot matrix obtained by themethod proposed by Griffin et al(68) A basis equal to 3 hasbeen supposed and to each symbol a coloured dot has been
associated
So the obtained matrix is
3313213112312212111332322233333132131123122121113323222333112132122312332322213213331113313213112312212111321322233311213212231233232221321333111223213233123113133321321112222232132331231131333213211122211213212231233232221321333111331321311231221211133232223333313213112312212111332322233333132131123122121113323222333
After the coded matrix is found out a differentprojection can be associated for each value ie foreach number which belongs to the interval M1 bN Forexample a coloured dot pattern can be obtained if thered green and blue colours are associated to thenumbers 1 2 and 3 respectively obtaining a patternlike the one shown in Fig 22
The resolution of the pattern can be increased bysimply increasing the basis value Depending on thecolour discriminating capability of the system em-ployed to image the scene almost any degree of res-olution can be obtained
In many applications the scene is not made bycolour neutral surfaces A simple reason could be thatthe imaging system used is only able to capture mono-chromatic images Then monochromatic light has tobe projected on the scene In this case the colouredassociation projected of each number can be changedfor a geometric association An example is shown inFig 23
The method proposed by Griffin et al is the uniquemethod studied from the decodification of the patterncaptured by the camera For each image point(x
p1 y
p1) the projector position point (x
p2 y
p2) from
which it has been emitted can be known As shown inthe mathematical section dedicated to surface measur-ing it is not necessary to know both projector coordi-nates Then the pattern can be obviously simplified toobtain a single row coded or column coded pattern
Recent progress in coded structured light 977
(b) Spatial Neighbor-hood (c) Direct coding
Figure 23 Examples of pattern coding strategies
2123 3D human face reconstruction
Given the importance of face reconstruction in a wide range of fields such as security
forensics or even entertainment it is no surprise that special focus has been devoted
to this area by the research community over the last decades A comparative study
of three different 3D face reconstruction approaches is presented in [20] Here the
most representative techniques of three different domains are tested These domains are
binocular stereo structured lighting and photometric stereo The experimental results
show that active reconstruction techniques perform better than purely passive ones for
this application
The majority of analysis on vision based reconstruction has focused on general perfor-
mance for arbitrary scenes rather than on specific objects as reported in [20] Neverthe-
less some effort has been made on evaluating structured lighting techniques with special
focus on human face reconstruction In [21] a comparison is presented between three
Chapter 2 Literature study 13
structured lighting techniques (Gray Code Gray Code Shift and Stripe Boundary) to
assess 3D reconstruction for human faces by using mono and stereo systems The results
show that the Gray Code shift coding performs best given the high number of emitted
patterns it uses A further study on this topic was performed by the same author in
[22] Again it was found that time-multiplexing techniques such as binary encoding
using Gray Code provide the highest accuracy With a rather different objective than
that sought by Woodward et al in [21] and [22] Fechteler et al [23] also focus their
effort on presenting a framework that captures 3D models of faces in high resolutions
with low computational load Here the system uses a single colored stripe pattern for
the reconstruction purpose plus a picture of the face illuminated with regular white light
that is used as texture
Particular aspects of 3D human face reconstruction such as proximity size and texture
involved make structured lighting a suitable approach On the contrary other recon-
struction techniques might be less suitable when dealing with these particular aspects
For example stereoscopic approaches fail to provide positive results when the textures
involved do not contain features that can be easily extracted and matched by means of
algorithms as in the case of the human face On the other hand the concepts behind
structured lighting make it very convenient to reconstruct these kind of surfaces given
the proximity involved and the size limits of the object in question (appropriate for
projecting encoded patterns)
With regard to the suitability of the different pattern coding strategies for our application
(3D human face reconstruction by means of a hand-held scanner) there are several
factors to consider Spatial neighborhood strategies do not offer high spatial resolution
which is needed by the algorithms that assess the fit quality of the various mask models
Direct coding strategies suffer from practical problems that affect their robustness to
different scenarios This centers the attention on the time-multiplexing techniques which
are known to provide high spatial resolution The problem with such techniques is
that they are highly sensible to movement which is likely to be present on a hand-
held device Fortunately there are several approaches as to how such problem can be
solved Consequently it is a time-multiplexing technique which is being employed in
our application
22 Camera calibration
Camera calibration is a crucial ingredient in the process of metric scene measurement
This section presents a review of some of the most popular techniques with special focus
on those that are regarded as adequate for our application
14 Chapter 2 Literature study
221 Definition
Camera calibration is the process of determining a mathematical approximation of the
physical and optical behavior of an imaging system by using a set of parameters These
parameters can be estimated by means of direct or iterative methods and they are divided
in two groups On the one hand intrinsic parameters determine how light is projected
through the lens onto the image plane of the sensor The focal length projection center
and lens distortion are all examples of intrinsic parameters On the other hand extrinsic
parameters measure the position and orientation of the camera with respect to a world
coordinate system as defined in [24] To better illustrate these ideas consider Figure
24 which corresponds to the optical system for the structured pattern projection and
triangulation considered in [25] The focal length fc and the projection center Oc are
examples of intrinsic parameters of the camera while the distance D between the camera
and the projector corresponds to an explicit parameter
Object
A
h
BC
H
D
ImagePlaneCamera
Reference Plane
Image Plane
Projector
f p
pO
cO
co
r
fχχ
Figure 24 A reference framework assumed in [25]
222 Popular techniques
In 1982 Hall et al [18] proposed a technique consisting of an implicit camera calibration
that uses a 3times4 transformation matrix which maps 3D object points to their respective
2D image projections Here the model of the camera does not consider any lens distor-
tion For a detailed description of this method refer to [18] Some years later in 1986
Faugeras improved Hallrsquos work by proposing a technique that was based on extracting
the physical parameters of the camera from the transformation technique proposed in
[18] The description of this technique is given in [26] and [27] A non-linear explicit
camera calibration that included radial lens distortion was proposed by Salvi in his PhD
Chapter 2 Literature study 15
thesis [28] which as he mentions can be regarded as a simple adaption of Faugerasrsquo lin-
ear method However a method that would become much more popular and that is still
widely used was proposed by Tsai in 1987 [29] Here the author proposes a two-step
technique that models only radial lens distortion Also worth mentioning is the model
proposed by Weng [30] in 1992 which includes three different types of lens distortion
The calibration mechanism that is currently being used in our application is based on
the work performed by Peter-Andre Redert as part of his PhD thesis [31] Although
this mechanism focuses on stereo camera calibration it was generalized for a system
with one camera and one projector It involves imaging a controlled scene from different
positions and orientations The controlled scene consists of a rigid calibration chart with
several markers The geometric and photometric properties of such markers are known
precisely so that they can be detected After corresponding markers in the different
images are found an algorithm searches the optimal set of camera parameters for which
triangulation of all corresponding marker-point pairs gives an accurate reconstruction of
the calibration chart This calibration mechanism is discussed further in Section 37
Chapter 3
3D face scanner application
This chapter provides a general overview of the 3D face scanner application developed
by the Smart Sensing amp Analysis research group and provided as a starting point for the
current project Figure 31 presents the main steps involved in the 3D reconstruction
process
Read binary file 31
Preprocessing 32
Normalization 33
Global motion compensation
36
Decoding 35
Tessellation 34
Calibration 37
Vertex filtering 38
Hole filling 39
bullBinary
bullXML Start
3D Model End
Figure 31 General flow diagram of the 3D face scanner application
The current scanner uses a total of 16 binary coded patterns that are sequentially pro-
jected onto the scene For each projection the scene is captured by means of the
embedded camera hence producing 16 different grayscale frames (Figure 32) that are
fed to the application in the form of a binary file This falls in line with the discussion
presented in Section 2123 of the literature study of why time-multiplexing strategies
result more suitable than spatial neighborhood or direct coding strategies for face recon-
struction applications In Sections 31 to 39 each of the steps shown in Figure 31 is
described
17
18 Chapter 3 3D face scanner application
Figure 32 Example of the 16 frames that are captured by the embedded camerawhile the scene is being illuminated with binary structured light patterns This frame
sequence is the input for the 3D face scanner application
31 Read binary file
The first step of the application is to read the binary file that contains the required
information for the 3D reconstruction The binary file is composed of two parts the
header and the actual data The header contains metadata of the acquired frames such
as the number of frames and the resolution of each one The second part contains the
actual data of the captured frames Figure 32 shows an example of such frame sequence
which from now on will be referred to as camera frames
32 Preprocessing
The preprocessing stage comprises the four steps shown in figure 33 Each of these steps
is described in the following subsections
Preprocessing
Parse XML file
Discard frames
Crop frames Scale
bullConvert to float
bullRange from 0-1
Figure 33 Flow diagram of the preprocessing stage
321 Parse XML file
In this stage the application first reads an XML file that is included for every scan
This file contains relevant information for the structured light reconstruction This
Chapter 3 3D face scanner application 19
information includes (i) the type of structured light patterns that were projected when
acquiring the data (ii) the number of frames captured while structured light patterns
were being projected (iii) the image resolution of each frame to be considered and (iv)
the calibration data
322 Discard frames
Based on the number of frames value read from the XML file the application discards
extra frames that do not contain relevant information for the structured light approach
but that are provided as part of the input
323 Crop frames
The original resolution of each camera frame (480times 768) is modified in order to obtain
a new more suitable resolution for the subsequent algorithms of the program (480 times754) This is accomplished by cropping the pixels that are close to the top border
of the images Note that this operation does not imply a loss of information in this
application in particular This is because pixels near the frame borders do not contain
facial information and therefore can be safely removed
324 Scale
Each pixel of the camera frame sequence (as provided by the embedded camera) is
represented by an 8-bit unsigned integer value that ranges from 0 to 255 In this stage
the data type is transformed from unsigned integer to floating point while dividing each
pixel value by 255 The new set of values range between 0 and 1
33 Normalization
Even though this section is entitled Normalization a few more tasks are being performed
in this stage of the application as shown by the blue rectangles in Figure 34 Here wide
arrows represent flow of data whereas dashed lines represent the order of execution The
numbers inside the small data arrows pointing towards the different tasks represent the
number of frames used as input by each task The dashed line rectangle that encloses
the normalization and texture 2 tasks represents that there is not a clear sequential
execution between these two but rather that these are executed in an alternating fashion
This type of diagram will result particularly useful in Chapter 5 in order to explain the
20 Chapter 3 3D face scanner application
Normalization
Texture 2
Modulation
16 Camera Frames
In
8 frames Out
Texture 1
8 frames Out
1 frame Out
1 frame Out
Execution flow
Figure 34 Flow diagram of the normalization stage
modifications that were made to the application to improve its performance An example
of the different frames that are produced in this stage are visualized in Figure 35 A
brief description of each of the tasks involved in this stage follows
331 Normalization
The purpose of this stage is to extract the reflectivity component (texture information)
from the camera frames while aiming at enhancing the deformed illumination patterns
in the resulting frame sequence Figure 35a illustrates the result of this process The
deformed patterns are essential for the 3D reconstruction process
In order to understand how this process takes place we need to look back at Figure
32 Here it is possible to observe that the projected patterns in the top row frames are
equal to their corresponding frame in the bottom row with the only difference being
that the values of the projected pattern are inverted For each corresponding pair a
new image frame is generated according to the following equation
Fnorm(x y) =Fcamera(x y a)minus Fcamera(x y b)
Fcamera(x y a) + Fcamera(x y b)
where a and b correspond to aligned top and bottom frames in Figure 32 respectively
An example of the resulting frame sequence is shown in Figure 35a
Chapter 3 3D face scanner application 21
(a) Normalized frame sequence
(b) Texture 2 frame sequence
(c) Modulation frame (d) Texture 1 frame
Figure 35 Example of the 18 frames produced in the normalization stage
332 Texture 2
The calculation of the texture 2 frame sequence follows the same procedure as the one
used to calculate the normalized frame sequence In fact the output of this process is an
intermediate step in the calculation of the normalized frames being this the reason why
the two processes are said to be performed in an alternating fashion The mathematical
equation that describes the calculation of the texture 2 frame sequence is
Ftexture2(x y) = Fcamera(x y a) + Fcamera(x y b)
The resulting frame sequence (Figure 35b) is used later in the global motion compen-
sation stage
22 Chapter 3 3D face scanner application
333 Modulation
The purpose of this stage is to find the range of measured values for each (x y) pixel of
the camera frame sequence along the time dimension This is done in two steps First
two frames are generated by finding the maximum and minimum values along the time
(t) dimension (Figure 36) for every (x y) value in a frame
Camera Frame
Sequence x
y t
Figure 36 Camera frame sequence in a coordinate system
Second a modulation frame is produced by finding the difference between the previously
generated frames ie
Fmod(x y) = Fmax(x y)minus Fmin(x y)
Such modulation frame (Figure 35c) is required later during the decoding stage
334 Texture 1
Finally the last task in the Normalization stage corresponds to the generation of the
texture image that will be mapped onto the final 3D model In contrast to the previous
three tasks this subprocess does not take the complete set of 16 camera frames as input
but only the 2 with finest projection patterns Figure 37 shows the four processing
steps that are applied to the input in order to generate a texture image such as the one
presented in Figure 35d
Texture 1
Average frames
Gamma correction
5x5 mean filter
Histogram stretch
Figure 37 Flow diagram for the calculation of the texture 1 image
Chapter 3 3D face scanner application 23
34 Global motion compensation
The major drawback of time-multiplexing strategies is its high sensitivity to movement
In fact if no measures are taken to correct the slight amount of movement of the scanner
or of the objects in the scene during the acquisition process the complete reconstruction
process fails Although the global motion compensation stage is only a minor part of
the mechanism that makes the entire application robust to motion it is not negligible
in the final result
Global motion compensation is an extensive field of research for which many different
approaches and methods have been contributed The approach used in this application
is amongst the simplest in level of complexity Nevertheless it suffices the needs of the
current application
Figure 38 presents an overview of the algorithm used to achieve the global motion
compensation This process takes as input the normalized frame sequence introduced in
the previous section As noted at the bottom of the figure these steps are repeated for
every pair of consecutive frames As a first step the pixels in each column are added for
both frames This results in two vectors that hold the cumulative sums of each frame
The second step is to determine by how many pixels the second image is displaced with
respect to the first one In order to achieve this the sum of absolute differences between
elements of the two column-sum vectors is calculated while slowly displacing the two
vectors with respect to each other The result is a new vector containing the SAD value
for each displacement Subsequently the index of the smallest element in the SAD
values vector is searched in order to determine the number of pixels that the second
image needs to be shifted The process concludes by performing the actual shift of the
second frame
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum columns
Sum columns
Minimize SAD
Shift Frame B
Figure 38 Flow diagram for the global motion compensation process
24 Chapter 3 3D face scanner application
35 Decoding
In Section 211 of the literature study the correspondence problem was defined as the
process of determining corresponding point pairs between the captured images and the
projected patterns This is exactly what is being accomplished during the decoding
stage
A novel approach has been implemented in which the identification of the projector
stripes is based not on the values of the pixels themselves (as it is typically done) but
rather on the edges formed by the transitions of the projected patterns Figure 39
illustrates the different sets of decoded values that result with each of these methods
Here it is possible to observe that the pixel-based method produces a stair-casing effect
due to the decoding of neighboring pixels that lie on the same stripe of the projected
pattern On the other hand the edge-based method removes this undesirable effect by
decoding values for only parts of the image in which a transition occurs Furthermore
this approach enables sub-pixel accuracy for the determination of the positions where the
transitions occur meaning that the overall resolution of the 3D reconstruction increases
considerably
350 352 354 356 358 360 362 364 366 368
200
201
202
203
204
205
206
207
Pixels along the y dimension of the image
Dec
oded
val
ues
Edge vs pixel based decoding
Edgeminusbased decodingPixelminusbased decoding
Figure 39 The stair-casing effect caused by pixel-based decoding is not present whenedge-based decoding is used
The decoding process results in a set of vertices each one associated with a depth code
Note however that the unit of measurement used to describe the position and depth of
each vertex is based on camera pixels and code values respectively meaning that these
vertices still do not represent the actual geometry of the face The calibration process
explained in a later section is the part of the application that translates the pixel and
Chapter 3 3D face scanner application 25
code values to standard units (such as millimeters) thus recreating the actual shape of
the human face
36 Tessellation
Tessellation refers to the process of covering a plane using different geometric shapes in
a manner such that no overlaps occur In computer graphics these geometric shapes
are generally chosen to be triangles also called ldquofacesrdquo The reason for using triangles
is that they have by definition its vertices on a same plane This in turn avoids
the generation of non-simple convex polygons that are not guaranteed to be rendered
correctly A complete example illustrating this point can be found in [32]
A set of 3D vertices calculated in the decoding stage is the input to the tessellation
process Here however the third dimension does not play a role and hence the z
coordinate for each of the vertices can be thought of as being equal to 0 This implies
that the new set of vertices consist only of (x y) coordinates that lie on the same plane
as shown in Figure 310a This graph corresponds to a very close view of the nose area
in the reconstructed face example
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model before tessellation
x
y
(a) Vertices before applying the Delaunay trian-gulation
368 370 372 374 376
258
259
260
261
262
Zoomedminusin model after tessellation
x
y
(b) Result after applying the Delaunay triangu-lation
Figure 310 Close view of the vertices in the nose area before and after the tessellationprocess
The question that arises here is how to connect the vertices in such a way that the com-
plete surface is covered with triangles The answer is to use the Delaunay triangulation
which is probably the most common triangulation used in computer vision The main
advantages that it has over other methods is that the Delaunay triangulation avoids
ldquoskinnyrdquo triangles reducing potential numerical precision problems [33] Moreover the
Delaunay triangulation is independent of the order in which the vertices are processed
26 Chapter 3 3D face scanner application
Figure 310b shows the result of applying the Delaunay triangulation to the vertices
shown in Figure 310a
Although there exists a number of different algorithms used to achieve the Delaunay
triangulation the final outcome of each conforms to the following definition a Delaunay
triangulation for a set P of points in a plane is a triangulation DT(P) such that no
point in P is inside the circumcircle of any triangle in DT(P) [33] Such definition can
be understood by examining Figure 311
Page 1 of 1
09072013fileDDesktopDelaunay_circumcircles_centerssvg
Figure 311 The Delaunay tessellation with all the circumcircles and their centers [33]
37 Calibration
The set of (x y) vertices with their corresponding depth code values that result from
the decoding process do not represent standard units of measure ie these still have to
be translated into standard units such as millimeters This is precisely the objective of
the calibration process
The calibration mechanism that is used in the application is based on the work of Peter-
Andre Redert as part of his PhD thesis [31] The entire process is divided into two parts
an offline and an online process Moreover the offline process consists of two stages
the camera calibration and the system calibration It is important to clarify that while
the offline process is performed only once (camera properties and distances within the
system do not change with every scan) the online process is carried out for every scan
instance The calibration stage referred to in Figure 31 is the latter
Chapter 3 3D face scanner application 27
371 Offline process
As already mentioned the offline process comprises the two stages described below
Camera calibration This part of the process is concerned with the calculation of the
intrinsic parameters of the camera as explained in Section 22 of the literature
study In short the objective is to precisely quantify the optical properties of the
camera The manner in which the current approach accomplishes this is by imag-
ing the special calibration chart shown in Figure 312 from different orientations
and distances After corresponding markers in the different images are found an
algorithm searches the optimal set of camera parameters for which triangulation
of all corresponding marker-point pairs gives an accurate reconstruction of the
calibration chart
Figure 312 The calibration chart used to determine the intrinsic parameters of a cam-era and the extrinsic parameters of a projector-scanner system All absolute dimensions
and photometric properties of the round markers are known precisely
System calibration The second part of the calibration process refers to the camera-
projector system calibration ie the determination of the extrinsic parameters
of the system Again this part of the process images the calibration chart from
different distances However this time structured light patterns are emitted by
the projector while the acquisition process takes place The result is that each
projector code is associated with a known depth and camera position
372 Online process
The result of the offline calibration is a set of parameters that model the optical proper-
ties of the scanner system These are passed to the application inside the XML file for
every scan Such parameters represent the coefficients of a fifth-order polynomial used
for translating the set of (x y) vertices with their corresponding depth code values into
28 Chapter 3 3D face scanner application
standard units of measure In other words the online process consists of evaluating a
polynomial with all the x y and depth code values calculated in the decoding stage in
order to reconstruct the geometry of the face Figure 313 shows the state of the 3D
model before and after the reconstruction process
(a) Before reconstruction (b) After reconstruction
Figure 313 The 3D model before and after the calibration process
38 Vertex filtering
As it can be seen from Figure 313b there are a number of extra vertices (and faces)
that have not been correctly reconstructed and therefore should be removed from the
model Vertex filtering is applied to remove all these noisy vertices and faces based on
different criteria The process is divided in the following three steps
381 Filter vertices based on decoding constraints
First if the distance between consecutive decoded points is larger than a maximum
threshold in the (x) or (z) dimensions then these are removed Second in order to
avoid false decoded vertices due to camera noise (specially in the parts of the images
where light does not hit directly) a minimal modulation threshold needs to be exceeded
or else the associated decoded point is discarded Finally if the decoded vertices lie
outside a margin defined in accordance to the image dimensions then these are removed
as well
Chapter 3 3D face scanner application 29
382 Filter vertices outside the measurement range
The measurement range defined during the offline calibration refers to the minimum
and maximum values that each decoded point can have in the z dimension These values
are read from the XML file The long triangles shown in Figure 313b that either extend
far into the picture or on the other hand come close to the camera are all removed in
this stage The resulting 3D model after being filtered with the two previously described
criteria is shown in Figure 314a
383 Filter vertices based on a maximum edge length
Several steps are involved in the removal of vertices based on the maximum edge length
criterion Initially the length of every edge contained in the model is calculated This
is followed by determining a new set of edges L that contains the longest edge in each
face After this operation the mean length value for the longest edge set is calculated
Finally only faces that have its longest edge value less than seven times the mean value
ie L lt 7timesmean(L) are kept Figure 314b shows the result after this operation
(a) The 3D model after thefiltering steps described inSubsections 381 and 382
(b) The 3D model after thefiltering step described in
Subsection 383
(c) The 3D model after thefiltering step described in
Section 39
Figure 314 3D resulting models after various filtering steps
39 Hole filling
In the last processing step of the 3D face scanner application two actions are performed
The first one is concerned with an algorithm that takes care of filling undesirable holes
that appear due to the removal of vertices and faces that were part of face surface This
is accomplished by adding a vertex in the middle of the hole and then connecting every
surrounding edge with this point The second action refers to another filtering step of
30 Chapter 3 3D face scanner application
vertices and faces In this last part of the application the program removes all but the
largest group of connected faces The final 3D model is shown in Figure 314c
310 Smoothing
Taking into account that the smoothing process is beneficial for visualization purposes
but not for the overall goal of the 3D mask sizing project this process was not taken
into account as part of the 3D face scanner application This is also the reason why it
is not included in Figure 31 Nevertheless this section provides a brief explanation of
the smoothing process that is currently used along with an example
A complete explanation of the algorithm that is being used to achieve the smoothing
effect is given in [34] In short the algorithm is based on a scale-dependent Laplacian
operator that diffuses the vertices along the surface An example of the resulting model
before and after applying the smoothing process is shown in Figure 315
(a) The 3D model before smoothing (b) The 3D model after smoothing
Figure 315 Forehead of the 3D model before and after applying the smoothing process
Chapter 4
Embedded system development
Modern design of embedded systems requires hardware and software not to be seen as
two different domains but rather as two complementary parts of a whole There are two
important trends that have made such unified view possible First integrated circuit
(IC) technology has evolved to the point where multiple processors of different types
coexist in a single IC Second the increasing complexity and average size of programs
added to the evolution of compiler technologies raised C compilers (and even C++ or
Java in some cases) to become commonplace in the development of embedded systems
[35]
This chapter discusses the embedded hardware and software implementation of the 3D
face scanner A brief account of the hardware and software tools that were used during
the development of the application is presented first Subsequently the first stage of the
development process is described which consists mainly of translating the algorithms
and methods described in Chapter 3 into a different programming language more suitable
for embedded systems Finally a preview of the developed visualization module that
displays the 3D reconstructed face is presented along with a brief description of its
functionality
41 Development tools
This section describes the set of tools used in the development of the embedded applica-
tion First an overview of the hardware is presented highlighting the most important
aspects that are of interest to the 3D face scanner application This is then followed by
a list of the software tools along with a short motivation for their selection A so called
remote development methodology was used for the compilation process The idea is to
31
32 Chapter 4 Embedded system development
run an integrated development environment (IDE) on a client system for the creation of
the project editing of the files and usage of code assistance features in the same manner
as done with local projects However when the project is built run or debugged the
process runs on a remote server with output and input transferred to the client system
411 Hardware
A current trend in the embedded world is the use of single-board computers (SBCs) as
development platforms SBCs combine most features of a conventional desktop computer
into a single board which can be as small as a credit card One or more processors of
different types memory on-board peripherals for multiple USB devices single or dual
gigabit Ethernet connections integrated graphics and audio capabilities amongst others
are common features included in these devices But perhaps what is most interesting
for embedded developers is the availability of several SBCs that come under open source
hardware category [36] Such SBCs are suitable for the implementation of a wide range
of applications on the basis of open operating systems
Two different hardware environments were used in the development of the current em-
bedded application a conventional desktop personal computer (PC) with an Intel x86
architecture and a SBC that was selected according to the following survey
4111 Single-board computer survey
A prior survey of popular SBCs available in the market was conducted with the intention
of finding the most suitable model for our application Table 41 presents a subset of the
considered models highlighting the most relevant characteristics for the 3D face scanner
application Refer to [37] for the complete survey
The model to be chosen has to comply with several requirements imposed by the 3D
face scanner application First support for both a camera and a projector had to be
offered While all of the considered models showed special support for video output
not all of them provided suitable characteristics for camera signal acquisition In fact
most of them rely on USB or Ethernet connections for this purpose The problem of
using USB technology for camera acquisition is that it is highly resource demanding On
the other hand Ethernet connections imply streaming video in formats such as MPEG
which require additional computational resources and buffering for decoding the video
stream Explicit periphery support for camera acquisition was only offered by two of
the considered models the BeagleBoard-xM and the PandaBoard
Chapter 4 Embedded system development 33
Table 41 Single-board computer survey
BeagleBoard-xM
CPU ARM Cortex-A8 1000 MHz
RAM 512 MB
Video output DVI-D HDMI S-Video
GPU PowerVR SGX OpenGL ES 20
Camera port Yes
Raspberry Pi Model B
CPU ARM1176 700 MHz
RAM 256 MB
Video output Composite RCA HDMI DSI
GPU Bradcom VideoCore IV OpenGL ES 20
Camera port No
Cotton candy
CPU dual-core ARM Cortex-A9 1200 MHz
RAM 1 GB
Video output HDMI
GPU quad-core 200 MHz Mali-400 MP OpenGL ES 20
Camera port No
PandaBoard
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI DVI-D LCD
GPU PowerVR SGX540 OpenGL ES 20
Camera port Yes
Via APC
CPU ARM11 800 MHz
RAM 512 MB
Video output HDMI VGA
GPU Built-in 2D3D Graphic OpenGL ES 20
Camera port No
MK802
CPU ARM Cortex-A8 1000 MHz
RAM 1 GB
Video output HDMI
GPU Mali-400 MP OpenGL ES 20
Camera port No
Snowball
CPU dual-core ARM Cortex-A9 1000 MHz
RAM 1 GB
Video output HDMI CVBS
GPU Mali-400 MP OpenGL ES 20
Camera port No
34 Chapter 4 Embedded system development
A second issue in the selection of the SBC was concerned with the project objective of
developing a module capable of visualizing the 3D reconstructed model by means of the
embedded projector It was considered that the achievement of this objective could be
greatly simplified by selecting an SBC model that offered support for rendering of 3D
computer graphics by means of an API preferably OpenGL ES Nevertheless all of the
SBC models considered in the survey featured a graphical processor unit (GPU) with
such support
Finally one last important motivation for the selection came from the experience gath-
ered through related projects The BeagleBoard-xM had been used as the embedded
computing unit in other projects [6] at Philips Research Eindhoven and therefore valu-
able implementation effort could be saved if this option were adopted Consequently it
was the BeagleBoard-xM that was selected as the SBC model for the development of
the current project
4112 BeagleBoard-xM features
The BeagleBoard-xM (Figure 41) is an SBC produced by Texas instruments It is
a low-power open-source hardware system that was designed specifically to address
the Open Source Community It measures 8255 by 8255 mm and offers most of the
functionality of a desktop computer It is based on Texas instrumentsrsquo DM3730 system
on chip (SoC) At the heart of the SoC lies an ARM Cortex-A8 processor clocked at 1
GHz and 512 MB of LPDDR RAM Several open operating systems have been made
compatible with such processor including Linux FreeBSD RISC OS Symbian and
Android Moreover the BeagleBoard-xM features a TMS320C64x+ DSP for accelerated
video and audio decoding and an Imagination Technologies PowerVR SGX530 GPU to
provide accelerated 2D and 3D rendering that supports OpenGL ES 20 [38]
In addition to the previously mentioned characteristics the ARM Cortex-A8 processor
comes with a general-purpose SIMD (Single instruction Multiple data) engine known as
NEON This technology is based on a 128-bit SIMD architecture extension that provides
flexible and powerful acceleration for consumer multimedia products as described [39]
412 Software
The main factors involved in the selection of software tools were (i) available support by
a large development community and (ii) acquisition costs and licensing charges Open
source software was adopted where possible Moreover prior experience with the tools
was also taken into account The software can be divided in two categories (i) software
Chapter 4 Embedded system development 35
Figure 41 The BeagleBoard-xM offered by Texas instruments
libraries that are used within the application and therefore are necessary for its execution
and (ii) software tools used specifically for the development of the application and hence
are not required for its execution In what follows each of these is briefly described
4121 Software libraries
The following software libraries are being used throughout the implementation of the
embedded application
libxml2 It is a software library used for parsing XML documents which was originally
developed for the Gnome project and was later made available for outside projects
as well The current application makes use of such tool for extracting the required
information from the XML file that is included for each scan
OpenCV Is an open source computer vision and machine learning software library
initiated by Intel It provides the necessary functionality to construct the Delaunay
triangulation described in Chapter 3 Though it was used in the initial versions of
the application later optimizations replaced OpenCV implementations
CGAL Consists of a software library that aims to provide access to algorithms in
computational geometry It is being used in the current application as a means
to simplify the resulting mesh surface ie to reduce the number of faces used to
represent the surface while keeping the overall shape of the reconstructed model
OpenGL ES OpenGL ES is a subset of the more general OpenGL designed specifi-
cally for embedded systems It consists of a cross-language multi-platform Appli-
cation Programming Interface (API) for rendering 2D and 3D computer graphics
36 Chapter 4 Embedded system development
It is used in the current application as the means to visualize the 3D reconstructed
model
GLUT The OpenGL Utility Toolkit consists of a system independent API for OpenGL
used to create windows andor frame buffers It is being used in the visualization
module of the application as well
4122 Software development tools
The following list presents a description of the most important software tools used for
the development of the embedded application
GNU toolchain It refers to a collection of programming tools produced by the GNU
Project that provide developing facilities for applications and operating systems
Among the several projects that comprise the GNU toolchain the following were
used
GNU Make It is a utility that automates the building process of executable
programs by reading the so-called makefiles which specify how to create the
target program
GCC It is the official compiler of the GNU operating system and has been
adopted as standard by most modern Unix-like computer operating systems
GNU Binutils Involves a set of programming tools that are used in the develop-
ment process of creating and managing programs object files libraries profile
data and assembly source code The commands as (assembler) ld (linker)
and gprof (profiler) were used among the complete set of binutil commands
GNU Project debugger It is the standard debugger for the GNU operating
system which was made available for the development of applications outside
this project as well
Valgrind It is a programming tool that can automatically detect memory management
errors It also provides the functionality of a profiler
Ubuntu A Linux based operating system that is distributed as free and open source
software It was installed in both the desktop PC and the SBC
Chapter 4 Embedded system development 37
42 MATLAB to C code translation
This section describes the first stage of the embedded application development that
involves the translation of a series of algorithms originally written in MATLAB code to
C
Despite the fact that there are a number of available tools that automatically translate
MATLAB code to C language such as MATLAB Coder by MathWorks MATLAB-to-
C Synthesis (MCS) by Catalytic Inc and AccelDSP by Xilinx these have a number
of pitfalls that compromise their applicability specially when the performance aspect
is of ultimate importance Perhaps what is most concerning is that each one of these
tools only supports a subset of the MATLAB language and functions meaning that
the complete functionality of MATLAB is immediately constrained by this requirement
In many cases this would imply a modification to the MATLAB code prior to the
translation process in order to filter out any feature or function not included in the
subset which adds overhead to the development process Examples of features not
supported by automatic translation tools are amongst others objects cell arrays nested
functions visualization or trycatch statements The use of an automatic translation
tool was discarded for this project taking into account that several of these unsupported
features are present in the MATLAB code
421 Motivation for developing in C language
There are a number of reasons that explain why C is among the most popular pro-
gramming languages used for the development of embedded systems The first is that
C language lies in an intermediate point between higher and lower level languages pro-
viding suitable characteristics for embedded system development from both sides The
problem with higher level languages relies on the fact that they do not provide suitable
characteristics for optimizing performance of the applications such as low-level memory
manipulation Furthermore unlike many of these higher level programming languages
C provides deterministic resource use which is an important feature when the target de-
vices contain limited resources On the other hand C outperforms lower level languages
in a number of aspects such as scalability and maintainability Two final motivations
for using C are (i) C compilers are available for almost all embedded devices which are
supported by a large pool of experienced C programmers and (ii) the vast majority of
hardware APIdrivers are written in C
38 Chapter 4 Embedded system development
422 Translation approach
As mentioned earlier a manual translation approach of the code was chosen over the
use of automatic translation tools A key part in the process of manually translating
MATLAB to C code is the verification process There are two major techniques used
to achieve such verification The first one consists of a systematic method of converting
the translated C code into a compiled MEX-file that can be merged into the original
MATLAB project Then by comparing the results generated by the MATLAB project
containing the C implementation wrapped in a MEX-file with those generated by the
original MATLAB project one should be able to verify the correctness of the translation
The second approach consists of writing corresponding intermediate results of both the
MATLAB and C implementations to external files and then using a file comparison tool
such as diff for Linux environments in order to validate equality of both results It was
the latter approach that was chosen for the development of the current application for
the following reason The former approach requires the C implementation to be wrapped
in a so called MEX wrapper which takes care of the communication between MATLAB
and C This task is considered to be error prone since crashes segmentation violations
or incorrect results can easily occur if the MEX wrapper does not allocate and access
the data properly as reported by Marc Barberis in [40] from Catalytic Inc
A number of pitfalls that add complexity to the manual translation process were iden-
tified throughout the development of this stage The most important are
bull Array elements in MATLAB code are indexed starting with 1 whereas C indexing
starts with 0 Although this does not seem like a major difference it was found
that such simple change could easily introduce errors
bull MATLAB uses column major ordering whereas C uses a row major approach
Special care must be taken to guarantee that spatial locality is maintained after
the translation process takes place ie the order in which data is processed should
correspond to the order in which it is laid out in memory Not complying with
this idea could induce a serious loss in performance of the resulting code
bull MATLAB is an interpreted language ie data types and variable dimensions are
only known at run-time thus these cannot be easily deduced from analyzing the
source code
bull MATLAB supports dynamic sizing of arrays whereas such operations in C require
explicit allocationreallocationdeallocation of memory using constructs such as
malloc realloc or free
Chapter 4 Embedded system development 39
bull MATLAB features a rich set of libraries that are not available in C This can imply
a large overhead in the development process if many of these functions have to be
implemented
bull Many of the vector-based operations available in MATLAB translate into nontriv-
ial loop constructs in C language For example mapping MATLABrsquos easy-to-use
concatenation operation to C involves considerable effort
bull Last but not least MATLAB supports reusing the same variable for storing data
of different types dimensions and sizes On the contrary C language requires all
variables to be cast to a specific data type (or declared as known in the program-
ming field) before they can be used Furthermore MATLAB uses a wide variety
of generic types that are not available in C and hence requires the programmer
to implement them while relying on structure constructs of primitive types
43 Visualization
This section describes the different steps involved in the visualization module developed
to display the reconstructed 3D models by means of the embedded projector contained
in the hand-held device Figure 42 extends the general overview of the application
presented in 31 by incorporating the visualization module This figure shows that a
resulting 3D model of the face reconstruction process consists of 4 different elements a
set of vertices a set of faces a set of UV coordinates and a texture image
3D Face Reconstruction
Camera Frame
Sequence
XML file
Faces
Vertices
UV coordinates
Visualization
Texture 1
Figure 42 Simplified diagram of the 3D face scanner application
Vertices and faces describe the geometry of the reconstructed model Each face consists
of three index values that determine the vertices that conform a triangle On the other
hand UV coordinates together with the texture image describe the texture of the model
Figure 43 shows how UV coordinates are used to map portions of the texture image
40 Chapter 4 Embedded system development
to individual parts of the model Each vertex is associated with an UV coordinate
When a triangle is rendered the corresponding UV coordinates of each vertex are used
to extract a portion of the texture image to place it on top of the triangle
119907
119906 (00)
(01) (11)
(10)
Figure 43 UV coordinate system
Figure 44 presents an overview of the visualization module The first step of the process
is to simplify the 3D model ie to reduce the number of triangles (and vertices) used
to represent the surface Note that while a high resolution is needed for the algorithms
that determine the fit quality of the different mask models a much lower resolution can
be used for visualization purposes In fact due to the limited available resources in
embedded systems such simplification becomes necessary to avoid lag when zooming
rotating or panning the model Edge collapse is a common term used for the simpli-
fication process which is shown in Figure 44 Input vertices and faces of this block
are converted into a smaller set denoted as New vertices and New faces on the diagram
However since the new set of vertices and faces do not have a one-to-one correspondence
to the original set of UV coordinates such coordinates have to be updated as well The
manner in which this is accomplished is by using the Nearest Neighbor algorithm Every
new vertex is assigned the UV coordinate of its closest original vertex
The next stage of the process is to format the new set of vertices faces and UV co-
ordinates together with the texture 1 image such that OpenGL can render the model
Chapter 4 Embedded system development 41
Subsequently normal vectors are calculated for every triangle which are mainly used
by OpenGL for lighting calculations Every vertex of the model has to be associated
with one normal vector To do this an average normal vector is calculated for each
vertex based on the normal vectors of the triangles that are connected to it Moreover
a cross-product multiplication is used to calculate the normal vector of each triangle
Once these four elements that characterize the 3D model are provided to OpenGL the
program enters in an infinite running state where the model is redrawn every time a
timer expires or when an interactive operation is sent to the program
Mesh simplification
Faces
Vertices
UV coordinates
Edge Collapse
New vertices Nearest
Neighbor
New faces New vertices New UV coordinates
Vertices
Change to OpenGL format
Calculate normals
GL vertices
GL faces GL UV coordinates
OpenGL
Texture 1
Normals
GL Texture 1
Figure 44 Diagram of the visualization module
Chapter 5
Performance optimizations
This chapter presents various performance optimizations made to the 3D face scanner
application ranging from high-level optimizations such as modification of the algo-
rithms to low-level optimizations such as the implementation of time-consuming parts
in assembly language
In order to verify that the achieved optimizations were valid in general and not for
specific cases 10 scans of different persons were used for profiling the performance of the
application Every profile consisted of running the application 10 times for each scan and
then averaging the results in order to reduce the influence that external factors might
have in the measured times Figure 51 presents an example of the graphs that will be
used throughout this and the following chapters to represent the changes in performance
Here each bar is divided into different colors that represent the distribution of the total
execution time among the various stages of the application described in Chapter 3 and
summarized in Figure 31
The translation from MATLAB to C code corresponds to the first optimization per-
formed The top two bars in Figure 51 show that the C implementation resulted in
a speedup of approximately 15 times over the MATLAB implementation running on
a desktop computer On the other hand the bottom two bars reflect the difference
in execution time after running the C implementation in two different platforms The
much more limited resources available in the BeagleBoard-xM have a clear impact on
the execution time The C code was compiled with GCCrsquos O2 optimization level
The bottom bar in Figure 51 represents the starting point for a set of optimization
procedures that will be described in the following sections The order in which these are
presented corresponds to the same order in which they were applied to the application
43
44 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 51 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
51 Double to single-precision floating-point numbers
The same representation format of floating-point numbers for the MATLAB and C
implementations were necessary to compare both results in each step of the translation
process The original C implementation was implemented using double-precision format
because this is the format used in the MATLAB code Taking into account that the
additional precision offered by double-precision format over single-precision was not
essential and that the ARM Cortex-A8 processor features a 32 bit architecture the
conversion from double to single-precision format was made Figure 52 shows that with
this modification the total execution time decreased from 1453 to 1252 sec
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Double-precision
Single-precision
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 52 Difference in execution time when double-precision format is changed tosingle-precision
52 Tuned compiler flags
While the previous versions of the C code were compiled with O2 performance level
the goal of this step was to determine a combination of compiler options that would
Chapter 5 Performance optimizations 45
translate into faster running code A full list of the options supported by GCC can be
found in [41] Figure 53 shows that the execution time decreased by approximately 3
seconds (24 of the total time 125 sec) after tuning the compiler flags The list of
compiler flags that produced best performance at this stage of the optimization process
were
-funroll-loops -Ofast -fsingle-precision-constant -ftree-loop-distribution
-mcpu=cortex-a8 -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp
0 1 2 3 4 5 6 7 8 9 10 11 12 13
O2 optimization level
Tuned flags
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 53 Execution time before and after tuning GCCrsquos compiler options
53 Modified memory layout
A different memory layout for processing the camera frames was implemented to further
exploit the concept of spatial locality of the program As noted in Section 33 many of
the operations in the normalization stage involve pixels from pairs of consecutive frames
ie first and second third and fourth fifth and sixth and so on Data of the camera
frames were placed in memory in a manner such that corresponding pixels between frame
pairs laid next to each other in memory The procedure is shown in Figure 54
However this modification yielded no improvement on the execution time of the appli-
cation as can be seen from Figure 55
54 Reimplementation of Crsquos standard power function
The generation of Texture 1 frame in the normalization stage starts by averaging the last
two camera frames followed by a gamma correction procedure The process of gamma
correction in this application consists of elevating each pixel to the 085 power After
profiling the application it was found that the power function from the standard math
C library was taking most of the time inside this process Taking into account that the
46 Chapter 5 Performance optimizations
Figure 54 Modification of the memory layout of the camera frames The blue redgreen and purple circles represent pixels of the first second third and fourth frames
respectively
0 1 2 3 4 5 6 7 8 9 10
Normal memory layout
Modified memory layout
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 55 The execution time of the program did not change with a different memorylayout for the camera frames
high accuracy offered by such function was not required and that the overhead involved
in validating the input could be removed a different implementation of such function
was adopted
A novel approach was proposed by Ian Stephenson in [42] explained as follows The
power function is usually implemented using logarithms as
pow(a b) = xlogx(a)lowastb
where x can be any convenient value By choosing x = 2 the process of calculating the
power function reduces to finding fast pow2() and log2() functions Such functions can
be approximated with a few instructions For example the implementation of log2(a)
can be approximated based on the IEEE floating point representation of a
Chapter 5 Performance optimizations 47
exponent mantissa
a = M lowast 2E
where M is the mantissa and E is the exponent Taking log of both sides gives
log2(a) = log2(M) + E
and since M is normalized log2(M) is always small therefore
log2(a) asymp E
This new implementation of the power function provides the improvement of the execu-
tion time shown in Figure 56
0 1 2 3 4 5 6 7 8 9 10
Standard C power function
Power function reimplemented
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 56 Difference in execution time before and after reimplementing Crsquos standardpower function
55 Reduced memory accesses
The original order of execution was modified to reduce the amount of memory access and
to increase the temporal locality of the program Temporal locality is a principle stating
that referenced memory locations will tend to be referenced again soon Moreover
the reordering allowed to replace floating-point calculations with integer calculations in
the modulation stage which are known to typically execute faster in ARM processors
Figure 57 shows the order in which the algorithms are executed before and after this
optimization By moving the calculation of the modular frame to the preprocessing
stage the values of the camera frames do not have to be re-read Moreover the processes
of discarding cropping and scaling frames are now being performed in an alternating
fashion together with the calculation of the modular frame This loop merging improves
the locality of data and reduces loop overhead Figure 58 shows the change in execution
time of the application for this optimization step
48 Chapter 5 Performance optimizations
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Modulation Texture 2 Normalize
Execution flow
Rest of program
(a) Original order of execution
Preprocessing
Parse XML file
Discard frames
Crop frames
Scale
Normalization
Texture 1 Texture 2 Normalize
Execution flow
Rest of program
Modulation
(b) Modified order of execution
Figure 57 Order of execution before and after the optimization
0 1 2 3 4 5 6 7 8 9
After reordering
Before reordering
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 58 Difference in execution time before and after reordering the preprocessingstage
Chapter 5 Performance optimizations 49
56 GMC in y dimension only
A description of the global motion compensation (GMC) method used in the applica-
tion was presented in Chapter 3 Figure 38 shows the different stages of this process
However this figure does not reflect the manner in which the GMC was initially imple-
mented in the MATLAB code In fact this figure describes the GMC implementation
after being modified with the optimization described in this section A more detailed
picture of the original GMC implementation is given in Figure 59 Previous research
found that optimal results were achieved when GMC is applied in the y direction only
The manner in which this was implemented was by estimating GMC for both directions
but only performing the shift in the y direction The optimization consisted in removing
all unnecessary calculations related to the estimation of GMC in the x direction This
optimization provides the improvement of the execution time shown in Figure 510
Global motion compensation
Normalized frame
sequence
For every pair of consecutive frames
Frame A
Frame B
Sum rows and columns
Sum rows and columns
Minimize SAD in x and y
Shift Frame B in y dim only
Figure 59 Flow diagram for the GMC process as implemented in the MATLAB code
0 1 2 3 4 5 6 7 8 9
Original GMC
GMC in y only
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 510 Difference in execution time before and after modifying the GMC stage
50 Chapter 5 Performance optimizations
57 Error in Delaunay triangulation
OpenCV was used to compute the Delaunay triangulation A series of examples available
in [43] were used as references for our implementation Despite the fact that OpenCV
constructs the triangulation while abstracting the complete algorithm from the pro-
grammer a not so straightforward approach is required to extract the triangles from
a so called subdivision OpenCV offers a series of functions that can be used to nav-
igate through the edges that form the triangulation It is therefore the responsibility
of the programmer to extract each of the triangles while stepping through these edges
Moreover care must be taken to avoid repeated triangles in the final set An error was
detected at this point of the optimization process in the mechanism that was being used
to avoid repeated triangles Figure 511 shows the increase in execution time after this
bug was resolved
0 1 2 3 4 5 6 7 8 9
Before fixing bug
After fixing bug
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 511 Execution time of the application increased after fixing an error in thetessellation stage
58 Modified line shifting in GMC stage
A series of optimizations performed to the original line shifting mechanism in the GMC
stage are explained in this section The MATLAB implementation uses the circular shift
function to perform the alignment of the frames (last step in Figure 38) Given that
there is no justification for applying a circular shift a regular shift was implemented
instead in which the last line of a frame is discarded rather than copied to the opposite
border Initially this was implemented using a for loop Later this was optimized even
further by replacing such for loop with the more optimized memcpy function available
in the standard C library This in turn led to a faster execution time
A further optimization was obtained in the GMC stage which yielded better memory
usage and faster execution time The original shifting approach used two equally sized
portions of memory in order to avoid overwriting the frame that was being shifted The
Chapter 5 Performance optimizations 51
need for a second portion of memory was removed by adding some extra logic to the
shifting process A conditional statement was included in order to determine if the shift
has to be performed in the positive or negative direction In case the shift is negative ie
upwards the shifting operation traverses the image from top to bottom while copying
each line a certain number of rows above it In case the shift is positive ie downwards
the shifting operation traverses the image from bottom to top while copying each line a
certain number of rows below it The result of this set of optimizations is presented in
Figure 512
0 1 2 3 4 5 6 7 8 9
Before changes to GMC
After changes to GMC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 512 Execution times of the application before and after optimizing the lineshifting mechanism in the GMC stage
59 New tessellation algorithm
A good motivation for using the Delaunay triangulation in a two-dimensional space is
presented by Rippa [44] who proves that such triangulation minimizes the roughness of
the resulting model Nevertheless an important characteristic of the decoding process
used in our application allows the adoption of a different triangulation mechanism that
improved the execution time significantly while sacrificing smoothness in a very small
amount This characteristic refers to the fact that the resulting set of vertices from
the decoding stage are sorted in an increasing manner This in turn removes the need
to search for the nearest vertices and therefore allows the triangulation to be greatly
simplified More specifically the vertices are ordered in increasing order from left to
right and bottom to top in the plane Moreover they are equally spaced along the y
dimension which simplifies even further the algorithm needed to connect such vertices
into triangles
The developed algorithm traverses the set of vertices row by row from bottom to top
creating triangles between every pair of consecutive rows Moreover each pair of con-
secutive rows is traversed from left to right while connecting the vertices into triangles
52 Chapter 5 Performance optimizations
The algorithm is presented in Algorithm 1 Note that for each pair of rows this algo-
rithm describes the connection of vertices until the moment in which the last vertex of
either row is reached The unconnected vertices that remain in the other longer row
are connected with the last vertex of the shorter row in a later step (not included in
Algorithm 1)
Algorithm 1 New tessellation algorithm
1 for all pair of rows do2 find the left-most vertices in both rows and store them in vertex row A and vertex row B3 while last vertex in either row has not been reached do4 if vertex row A is more to the left than vertex row B then5 connect vertex row A with the next vertex on the same row and with vertex row B6 change vertex row A to the next vertex on the same row7 else8 connect vertex row B with the next vertex on the same row and with vertex row A9 change vertex row B to the next vertex on the same row
10 end if11 end while12 end for
Figure 513 shows the result of applying the two described triangulation methods to the
same set of vertices The execution time of the application was reduced by approximately
14 seconds with this optimization as shown in Figure 514 Furthermore the new
triangulation algorithm resulted in a speedup of approximately 125 times over OpenCVrsquos
Delaunay triangulation implementation
406 408 410 412 414
18
19
20
21
22
Delaunay triangulation
x
y
(a) Delaunay triangulation
406 408 410 412 414
18
19
20
21
22
Optimized triangulation
x
y
(b) Optimized triangulation
Figure 513 The Delaunay triangulation was replaced with a different algorithm thattakes advantage of the fact that vertices are sorted
510 Modified decoding stage
A major improvement was achieved in the execution time of the application after op-
timizing several time-consuming parts of the decoding stage As a first step two fre-
quently called functions of the standard math C library namely ceil() and floor()
Chapter 5 Performance optimizations 53
0 1 2 3 4 5 6 7 8 9
Delaunay triangulation
New triangulation algorithm
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 514 Execution times of the application before and after replacing the Delaunaytriangulation with the new approach
were replaced with faster implementations that used pre-processor directives to avoid the
function call overhead Moreover the time spent in validating the input was also avoided
since it was not required However the property that allowed the new implementations
of the ceil() and floor() functions to increase the performance to a greater extent
was the fact that such functions only operate on index values Given that index values
only assume non-negative numbers the implementation of each of these functions was
further simplified
A second optimization applied to the decoding stage was to replace dynamically allocated
memory on the heap with statically allocated memory on the stack while controlling that
the amount of memory to be stored would not cause a stack overflow Stack allocation
is usually faster since it is memory that is faster addressable
The last optimization consisted on the detection and removal of several tasks that were
not contributing to the final result The reason why such tasks were present in the
application is due to the fact that several alternatives were implemented for achieving a
common goal during the algorithmic design stage However after assessing and choosing
the best option the other ones were forgotten to be entirely removed
The overall result of the optimizations described in this section is shown in Figure 515
An important reduction of approximately 1 second was achieved As a rough estimate
half of this speedup can be attributed to the removal of the nonfunctional code
511 Avoiding redundant calculations of column-sum vec-
tors in the GMC stage
This section describes the last optimization performed to the GMC stage The algorithm
presented in Figure 38 has the following shortcoming for every pair of consecutive
54 Chapter 5 Performance optimizations
0 1 2 3 4 5 6 7
Original decoding stage
Modified decoding stage
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 515 Execution time of the application before and after optimizing the decodingstage
frames the sum of pixels in each column is calculated for both frames This means that
the column-sum vector is calculated twice for each image except for the first and last
frame (n = 1 and n = N) By reusing the column-sum vector calculated in the previous
iteration such recalculation can be avoided An updated version of the GMC stage that
incorporates this idea is shown in Figure 516 The speedup achieved for the GMC stage
after performing this optimization was approximately 18 times Figure 517 shows the
execution times of the application before and after removing the redundant calculations
512 NEON assembly optimization 1
The ARM NEON general-purpose SIMD engine featured in the Cortex-A series proces-
sors was exploited for the last series of optimizations performed to the 3D face scanner
application The first step was to detect the stages of the application that exhibit rich
amount of exploitable data operations where the NEON technology could be applied
The vast majority of the operations performed in the preprocessing normalization and
global motion compensation stages are data independent and therefore suitable for
being computed in parallel on the ARM NEON architecture extension
There are four major approaches to integrate NEON technology into an existent appli-
cation (i) by using a vectorizing compiler that automatically translates CC++ code
into NEON instructions (ii) by using existent CC++ libraries based on NEON technol-
ogy (iii) by using the NEON CC++ intrinsics which provide low-level access to NEON
instructions but with the compiler doing some of the work associated with writing as-
sembly instructions and (iv) by directly writing NEON assembly instructions linked to
the CC++ project in the compilation process A detailed explanation of each of these
approaches can be found in [45] Based on the results achieved in [46] directly writing
NEON assembly instructions outperforms the other alternatives and therefore it was
this approach that was adopted
Chapter 5 Performance optimizations 55
Global motion compensation
First pair of consecutive frames
Normalized frame
sequence
For every remaining pair of consecutive frames (from n=3 to n=N)
Column vector Frame n-1
Frame n
Normalized frame
sequence
Frame 1
Frame 2
Sum columns
Sum columns
Minimize SAD
Shift Frame 2
Sum columns
Minimize SAD
Shift Frame n
Figure 516 Flow diagram for the optimized GMC process that avoids the recalculationof the imagersquos columns sum
0 1 2 3 4 5 6
With recalculations
Without recalculations
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 517 Execution times of the application before and after avoiding redundantcalculations of column-sum vectors in the GMC stage
56 Chapter 5 Performance optimizations
Figure 518 presents the basic principle behind the SIMD architecture extension along
with the related terminology Depending on the data type of the elements involved in
the operation either 2 4 8 or 16 elements can be operated with a single instruction
The NEON register bank may be viewed either as sixteen 128-bit registers (Q0-Q15)
or as thirty-two 64-bit registers (D0-D31) where each Q0-Q15 registers map to a pair
of D registers Figure 518 may be interpreted either as an operation of 2 Q registers
where each of the 8 elements would have 16 bits or as an operation of 2 D registers
where each of the 8 elements would be 8 bits wide
Elements
Operation
Source Registers
Destination Register
Lane
Figure 518 NEON SIMD architecture extension featured by Cortex-A series processorsalong with the related terminology
An overview of the resulting execution flow of the preprocessing and normalization stages
after applying the first NEON assembly optimization is presented in Figure 519 Here
green rectangles represent stages of the application that are now calculated with NEON
technology whereas blue rectangles represent stages implemented in regular C code In
Section 32 of Chapter 3 it was mentioned that each pixel in the input camera frame
sequence is represented with an 8-bit unsigned integer value With the NEON optimiza-
tion groups of 8 pixels are packed into D registers in order to process 8 elements at a
time Note that each resulting element of the texture 2 frame is immediately reused in
the normalization process Moreover each of the 8 resulting values in both the texture
2 generation and the normalization stage are converted to a 32-bit floating point value
that ranges from 0 to 1
Figure 520 shows that the total execution time of the application actually increased
after this modification There are two reasons that explain what might have caused
such increment First note that the stage of the application that most contributed to
the increase in time was the read binary file The execution time of such process is
heavily affected by any other processes that might be running in parallel Moreover the
execution time of all stages other than those involved with the NEON optimization also
increased This suggests that indeed another process was probably running in parallel
Chapter 5 Performance optimizations 57
using resources of the board and hence affecting the performance of the application
Nevertheless the overall time reduction for the preprocessing and normalization stages
after the optimization was small One very probable reason to explain this could be
found in the modulation stage The first step of such process is to find the smallest
and largest values for every camera frame pixel in the time dimension by means of if
statements When such task is implemented with conventional C language the proces-
sor makes use of a branch prediction mechanism in order to speed up the instruction
pipeline However the use of NEON assembly instructions forces the processor to per-
form the comparison for every single pack of 8 values ignoring the existence of the
branch prediction mechanism
513 NEON assembly optimization 2
After successfully implementing several stages of the application with the use of NEON
assembly instructions the possibility of applying a similar approach to other parts of
the application was analyzed The averaging and gamma correction processes involved
in the calculation of texture 1 were found to be good targets for such purpose The
absence of a NEON instruction to calculate the power of a number can be overcome
by using a lookup table (LUT) In order to explain the approach of how the LUT was
implemented a hypothetical example of camera frames with 2-bit pixels is presented in
Figure 521 Here the first two rows represent the values that corresponding pixels in
the two frames can assume The third row of the table contains the 7 possible values
that can result from averaging two pixels The number of possible values for the general
case is 2n2 minus 1 where n is the number of bits used to represent a pixel Finally the
fourth row corresponds to the actual LUT which is the average value raised to the 085
power What is interesting is that the sum of the two pixels pixel A + pixel B which in
our application is already determined during the texture 2 stage can be used to index
the table
As a final step in the optimization process a further improvement to the execution flow
presented in Figure 519 was made From this diagram it is possible to observe that the
application has to re-read the last 2 camera frames to calculate the texture 1 frame In
order to avoid such overhead the processing of the camera frames was divided into two
different stages The first one involves the calculation of the modulation texture 2 and
normalization processes for the first 14 frames whereas the second stage additionally
calculates the averaging and gamma correction processes for the last two frames The
merging of these 5 processes for the last two frames is convenient since the addition of
corresponding pixels needed in the averaging and gamma correction stage is already
58 Chapter 5 Performance optimizations
For camera frames 123456hellip1516
For each row
For each vector
Execution flow
Rest of program
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Texture 1
Parse XML file
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
Figure 519 Modified execution flow of the application after implementing several ofthe tasks involved in the preprocessing and normalization stages with NEON assemblyinstructions Green rectangles represent stages that were implemented with NEONtechnology whereas blue rectangles represent stages implemented in regular C code
Chapter 5 Performance optimizations 59
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 1
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 520 Execution times of the application before and after applying the firstNEON assembly optimization
0 1 2 3
3 25 2 15 1 05 0
119901119894119909119890119897 119860
119886119907119890119903119886119892119890
2544 2179 1803 1411 1 0555 0 119886119907119890119903119886119892119890085
119901119894119909119890119897 119860 + 119901119894119909119890119897 119861
119901119894119909119890119897 119861
0 1 2 3
Figure 521 Example of how to construct a LUT to apply gamma correction to theaverage of two 2-bit pixels
being calculated as part of the other processes These modifications of the order in which
the different processes are executed are illustrated in Figure 523 which corresponds
to the definite execution flow diagram for the preprocessing and normalization stages
Moreover the improvement of the execution time shown in Figure 522
This final optimization concludes the embedded system development of the 3D face
reconstruction application
0 1 2 3 4 5 6
Before optimization
NEON assembly optimization 2
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 522 Execution times of the application before and after applying the secondNEON assembly optimization
60 Chapter 5 Performance optimizations
For camera frames 123456hellip1314
For each row
For each vector
Execution flow
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Parse XML file
5x5 mean filter
camera frames 1516
For each row
For each vector
Modulation (step 2)
Scale
camera frames 1516
For each row
For each vector
Modulation
(step 1)
Scale
Texture 2 1199071 + 1199072
Scale
Normalize 1199071 minus 1199072
1199071 + 1199072
Crop row
Average amp Gamma
corr
Rest of program
Figure 523 Final execution flow of the tasks involved in the preprocessing and nor-malization stages that were implemented using NEON assembly instructions Greenrectangles represent stages of the application that are implemented with NEON tech-
nology whereas blue rectangles represent stages implemented in regular C code
Chapter 6
Results
This chapter presents the results of the various stages involved in the implementation
of the 3D face scanner application capable of running on an embedded device The first
section focuses on the results obtained after translating the MATLAB implementation
to C language This is followed by a brief account of the visualization module devel-
oped to display the reconstructed model by means of the embedded device Finally
the last section provides a summary of the performance improvements made to the C
implementation by means of different optimization techniques
61 MATLAB to C code translation
In order to measure the correctness of the conversion from MATLAB to C 13 different
face scans were processed with both the MATLAB and C implementations A qual-
itative comparison of the corresponding reconstructed models yielded no difference in
results Linuxrsquos diff tool was used to perform the comparison between corresponding
models with a precision of 4 decimal places
In what follows a series of graphs show the execution times for various versions of the
application Each bar corresponds to the average execution time required to process 10
scans of different people Moreover each of the different scans was run 10 times and
averaged The bars are divided into different colors that represent the distribution of the
total execution time among the various stages of the application described in Chapter 3
and summarized in Figure 31 The top and middle bar in Figure 61 corresponds to the
average execution time of the original MATLAB and C implementations respectively
after processed on a desktop computer The C implementation resulted in a speedup of
approximately 15 times over the MATLAB implementation (from 817 to 054 seconds)
61
62 Chapter 6 Results
On the other hand the last bar in Figure 61 corresponds to the average execution time
of the initial C implementation after processed on the embedded device a BeagleBoard-
xM The execution time increased approximately 14 seconds with respect to the time
spent when processed on a PC The C code was compiled with GCCrsquos O2 optimization
level
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
C code running on BB
C code running on PC
MATLAB code running on PC
time (sec)
Read binary file Preprocessing Normalization
Global motion compensation Decoding Tessellation
Calibration Vertex filtering Hole filling
Other
Figure 61 Execution times of (Top) the MATLAB implementation on a desktop com-puter (Middle) the C implementation on a desktop computer (Bottom) the C imple-
mentation on the BeagleBoard-xM
62 Visualization
A visualization module was developed to display the resulting 3D models by means of the
projector contained in the embedded device Figure 62 presents an example The two
images in the top row show a high-resolution 3D model composed of 64k faces rendered
in two different modes The bottom two images show the same 3D model after being
processed with a mesh simplification mechanism that results in a much lower resolution
model (1229 faces) suitable for being rendered by means of an embedded device It is
interesting to note that even though the lower resolution model has approximately 2
of the faces contained in the high resolution model the quality degradation is hardly
visible by comparing the two textured models
63 Performance optimizations
Figure 63 presents the performance evolution of the 3D face scannerrsquos C implementation
using a BeagleBoard-xM as the processing platform A wide range of optimizations de-
scribed in Chapter 5 were used to reduce the execution time of the application from 145
to 51 seconds This translates in a speedup of approximately 285 times Furthermore
Chapter 6 Results 63
(a) High-resolution 3D model with tex-ture (63743 faces)
(b) High-resolution 3D model wire-frame (63743 faces)
(c) Low-resolution 3D model with tex-ture (1229 faces)
(d) Low-resolution 3D model wire-frame (1229 faces)
Figure 62 Example of the visualization module developed
Figure 64 presents individual graphs for each stage of the process which provides an
idea of the speedup achieved for each individual stage
64 Chapter 6 Results
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
No optimizations
Doubles to floats
Tuned compiler flags
Modified memory layout
pow func reimplemented
Reduced memory accesses
GMC in Y dir only
Delaunay bug
Line shifting in GMC
New tessellation algorithm
Modified decoding stage
No recalculations in GMC
ASM + NEON implem 1
ASM + NEON implem 2
time (sec)
Read binary file Preprocessing NormalizationGlobal motion compensation Decoding TessellationCalibration Vertex filtering Hole fillingOther
Figure 63 Performance evolution of the 3D face scannerrsquos C implementation
Chapter 6 Results 65
0 01 02
Before
After
time (sec)
(a) Read binary file
0 025 05 075 1
Before
After
time (sec)
(b) Preprocessing
0 1 2 3
Before
After
time (sec)
(c) Normalization
0 03 06 09 12
Before
After
time (sec)
(d) GMC
0 1 2 3
Before
After
time (sec)
(e) Decoding
0 04 08 12 16
Before
After
time (sec)
(f) Tessellation
0 1 2 3 4 5
Before
After
time (sec)
(g) Calibration
0 01 02 03 04
Before
After
time (sec)
(h) Vertex filtering
0 05 1 15 2
Before
After
time (sec)
(i) Hole filling
Figure 64 Execution time for each stage of the application before and after the com-plete optimization process
Chapter 7
Conclusions
This thesis presented the embedded implementation of a 3D face scanner application
that uses the structured lighting technique A manual translation of the algorithms in
charge of the reconstruction process was performed from MATLAB to C using a file
comparison tool to validate the results of both implementations Thirteen different face
scans were used to verify the correctness of the translated C implementation with respect
with the original MATLAB code the comparison of each corresponding model yielded no
difference whatsoever The C implementation resulted in a speedup of approximately 15
times over the original MATLAB code running on a desktop PC However running the
C implementation on an embedded platform namely a BeagleBoard-xM presented an
increase of the execution time by a factor of 27 times ie an increase of approximately
14 seconds
A wide range of optimizations were performed to reduce the execution time of the appli-
cation These include high-level optimizations such as modifications to the algorithms
and reordering of the execution flow middle-level optimizations such as avoiding re-
dundant calculations and function call overhead and low-level optimizations such as
reimplementing sections of code with NEON assembly instructions
A visualization module based on OpenGL ES was developed to display the reconstructed
3D models by means of the projector contained in the embedded device However given
the high resolution of the reconstructed 3D models and the limited available resources
on the embedded platform a mesh simplification mechanism was implemented to reduce
the resolution until a point where the visualization module could be used with no lag
Although the reconstruction process is only part of a broader project that aims to
develop a technological means to assist sleep technicians in the selection of an adequate
CPAP mask model and size allowing such process to run directly on the device is a first
67
68 Chapter 7 Conclusions
step towards the goal of creating an autonomous self-contained mask advise system
Moreover the functionality of a 3D hand-held face scanner is an important topic that
can easily be extended to different application fields such as security or entertainment
Last but not least the optimizations that allowed the execution time of the application
to be reduced to approximately 5 seconds when processed on an embedded platform
should serve as a reference point not only for other parts of the application where similar
approaches can be adopted but also for related projects where performance is of crucial
interest
71 Future work
Although a significant reduction of the applicationrsquos execution time was achieved with
the set of optimizations presented in this work this is by no means the best result that
can be obtained On the contrary this set of optimizations open new possibilities for
improving the applicationrsquos performance for example by applying similar approaches
to other parts of the application The first idea that comes to mind is to extend the
use of NEON technology to other parts of the program that exhibit a high number of
independent data calculations The 5times 5 filter involved in the calculation of the texture
1 frame together with the sum of columns and the row shifting operations included in
the GMC stage are good candidates to implement using NEON assembly instructions
Note however that further optimizing parts of the program that comprise a small
percentage of the total execution time will not yield significant improvements to the
overall applicationrsquos performance This implies that an assessment of the distribution
of the total execution time among the different tasks of the application is necessary to
determine which parts are the current bottlenecks and hence worth optimizing The last
profiling of the application (bottom bar in Figure 63) reveals that a large fraction of
the execution time is spent in three stages namely decoding calibration and hole filling
Whereas the decoding stage was analyzed and partly optimized in this work the latter
two were not considered for optimization
According to several observations there is a high probability that the calibration stage
can be optimized in an important manner First note the significant increase of the
execution time of this particular stage between the top and bottom profilings in Figure
61 Whereas such increase of time is expected on stages that involve matrix operations
(MATLAB usually performs well with this kind of operations) stages based on control
structures such as the nested for loops present in the calibration stage are not expected
to show a decrease of performance in this manner Moreover note how the first two
optimizations in Figure 63 ie changing the data type from double to float and tuning
Chapter 7 Conclusions 69
the compiler flags had a significant impact on this stagersquos performance Considering
these series of observations it is very probable that the current C implementation of this
stage is not utilizing the available resources of the Beagleboard-xM in the best possible
manner Analyzing how well this part of the program is exploiting spatial and temporal
locality could reveal directions for further optimizations
Finally it is worth noting a few more ideas of how the performance of the application
could still be improved Tuning GCCrsquos compiler flags was performed early in the overall
optimization process It is probable that the combination of flags found to be optimal in
that moment is not anymore for the current state of the application Therefore a new
assessment of compiler flags should be performed It is also important to mention that
there is a specific compiler flag namely -mfloat-abi that specifies which floating-point
application binary interface (ABI) to use The permissible values are soft softfp and
hard Despite the fact that a hard-float ABI is expected to produce better performance
results the use of such configuration was not possible in the current project The reason
is that part of the libraries provided by the underlying operating system where compiled
with soft-float ABI and these two ABIrsquos are not link-compatible Nevertheless enabling
this configuration is just a matter of recompiling the OS and the other libraries that are
used by the application with hard-float ABI support Finally it should be noted that
there are a wide range of compilers available on the market that could produce better
results than those of GCC Despite the fact that as part of the current project a few of
the other options were tested GCCrsquos results were always superior However it would
be interesting to measure how the GCC compiler compares with the compilers produced
by ARM which are known to produce fast running code
Bibliography
[1] F J Nieto T B Young B K Lind E Shahar J M Samet S Redline R B
DrsquoAgostino A B Newman M D Lebowitz T G Pickering et al ldquoAssociation
of sleep-disordered breathing sleep apnea and hypertension in a large community-
based studyrdquo JAMA the journal of the American Medical Association vol 283
no 14 pp 1829ndash1836 2000 [Online] Available httpjamaama-assnorg
content283141829short (cit on p 1)
[2] J Bruysters ldquoLarge dutch sleep survey reveals that 4 out of 5 people suffering
from sleep apnea are unaware of itrdquo University of Twente Tech Rep Mar 2013
[Online] Available httpwwwutwentenlenarchive201303large_
dutch_sleep_survey_reveals_that_4_out_of_5_people_suffering_from_
sleep_apnea_are_unaware_of_itdocx (cit on p 1)
[3] S Garrigue P Bordier S S Barold and J Clementy ldquoSleep apneardquo Pacing and
clinical electrophysiology vol 27 no 2 pp 204ndash211 2004 [Online] Available
httponlinelibrarywileycomdoi101111j1540-8159200400411
xfull (cit on p 1)
[4] R Klette K Schluns and A Koschan Computer Vision Three-Dimensional Data
from Images Springer 1998 isbn 9789813083714 [Online] Available http
booksgooglenlbooksid=qOJRAAAAMAAJ (cit on pp 5 6 9 10)
[5] J Posdamer and M Altschuler ldquoSurface measurement by space-encoded projected
beam systemsrdquo Computer Graphics and Image Processing vol 18 no 1 pp 1 ndash17
1982 issn 0146-664X doi 1010160146-664X(82)90096-X [Online] Available
httpwwwsciencedirectcomsciencearticlepii0146664X8290096X
(cit on pp 5 9 11)
[6] M Rocque ldquo3D map creation using the structured light technique for obstacle
avoidancerdquo Masterrsquos thesis Eindhoven University of Technology Den Dolech 2
- 5612 AZ Eindhoven - The Netherlands 2011 [Online] Available http
alexandriatuenlextra1afstverslwsk-irocque2011pdf (cit on pp 6
34)
71
72 Bibliography
[7] S Inokuchi K Sato and F Matsuda ldquoRange imaging system for 3-D object
recognitionrdquo in International Conference on Pattern Recognition 1984 (cit on
pp 9 11)
[8] M Minou T Kanade and T Sakai ldquoA method of time-coded parallel planes of
light for depth measurementrdquo Trans Institute of Electronics and Communication
Engineers of Japan vol E64 no 8 pp 521ndash528 Aug 1981 (cit on pp 9 11)
[9] M Maruyama and S Abe ldquoRange sensing by projecting multiple slits with random
cutsrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol 15
no 6 pp 647 ndash651 Jun 1993 issn 0162-8828 doi 10110934216735 (cit on
pp 9 11)
[10] N Durdle J Thayyoor and V Raso ldquoAn improved structured light technique
for surface reconstruction of the human trunkrdquo in Electrical and Computer Engi-
neering 1998 IEEE Canadian Conference on vol 2 May 1998 874 ndash877 vol2
doi 101109CCECE1998685637 (cit on pp 9 11)
[11] M Ito and A Ishii ldquoA three-level checkerboard pattern (TCP) projection method
for curved surface measurementrdquo Pattern Recognition vol 28 no 1 pp 27 ndash40
1995 issn 0031-3203 doi 1010160031-3203(94)E0047-O [Online] Available
httpwwwsciencedirectcomsciencearticlepii0031320394E0047O
(cit on pp 9 11)
[12] K L Boyer and A C Kak ldquoColor-encoded structured light for rapid active
rangingrdquo Pattern Analysis and Machine Intelligence IEEE Transactions on vol
PAMI-9 no 1 pp 14 ndash28 Jan 1987 issn 0162-8828 doi 101109TPAMI1987
4767869 (cit on pp 9 11)
[13] C-S Chen Y-P Hung C-C Chiang and J-L Wu ldquoRange data acquisition using
color structured lighting and stereo visionrdquo Image Vision Comput pp 445ndash456
1997 (cit on pp 9 11)
[14] P M Griffin L S Narasimhan and S R Yee ldquoGeneration of uniquely encoded
light patterns for range data acquisitionrdquo Pattern Recognition vol 25 no 6
pp 609 ndash616 1992 issn 0031-3203 doi 1010160031- 3203(92)90078- W
[Online] Available httpwwwsciencedirectcomsciencearticlepii
003132039290078W (cit on pp 9 12)
[15] B Carrihill and R Hummel ldquoExperiments with the intensity ratio depth sensorrdquo
Computer Vision Graphics and Image Processing vol 32 no 3 pp 337 ndash358
1985 issn 0734-189X doi 1010160734-189X(85)90056-8 [Online] Available
httpwwwsciencedirectcomsciencearticlepii0734189X85900568
(cit on pp 9 12)
Bibliography 73
[16] J Tajima and M Iwakawa ldquo3-D data acquisition by rainbow range finderrdquo in
Pattern Recognition 1990 Proceedings 10th International Conference on vol i
Jun 1990 309 ndash313 vol1 doi 101109ICPR1990118121 (cit on pp 9 12)
[17] C Wust and D Capson ldquoSurface profile measurement using color fringe projec-
tionrdquo English Machine Vision and Applications vol 4 pp 193ndash203 3 1991 issn
0932-8092 doi 101007BF01230201 [Online] Available httpdxdoiorg
101007BF01230201 (cit on pp 9 12)
[18] E Hall J Tio C McPherson and F Sadjadi ldquoMeasuring curved surfaces for
robot visionrdquo Computer vol 15 no 12 pp 42 ndash54 Dec 1982 issn 0018-9162
doi 101109MC19821653915 (cit on pp 10 14)
[19] J Salvi J Pags and J Batlle ldquoPattern codification strategies in structured light
systemsrdquo Pattern Recognition vol 37 pp 827ndash849 2004 (cit on pp 11 12)
[20] A Woodward D An G Gimelrsquofarb and P Delmas ldquoA comparison of three 3-D
facial reconstruction approachesrdquo in Multimedia and Expo 2006 IEEE Interna-
tional Conference on Jul 2006 pp 2057 ndash2060 doi 101109ICME2006262619
(cit on p 12)
[21] D An A Woodward P Delmas G Gimelfarb and J Morris ldquoComparison of
active structure lighting mono and stereo camera systems application to 3D face
acquisitionrdquo in Computer Science 2006 ENC rsquo06 Seventh Mexican International
Conference on Sep 2006 pp 135 ndash141 doi 101109ENC20068 (cit on pp 12
13)
[22] A Woodward D An P Delmas and C-Y Chen ldquoComparison of structured
lightning techniques with a view for facial reconstructionrdquo in Proc Image and
Vision Computing New Zealand Conf Dunedin New Zealand 2005 pp 195ndash200
[Online] Available httppixelotagoacnzipapers35pdf (cit on p 13)
[23] P Fechteler P Eisert and J Rurainsky ldquoFast and high resolution 3D face scan-
ningrdquo in Image Processing 2007 ICIP 2007 IEEE International Conference on
vol 3 Oct 2007 pp III ndash81 III ndash84ndash doi 101109ICIP20074379251 (cit on
p 13)
[24] J Salvi X Armangu and J Batlle ldquoA comparative review of camera calibrating
methods with accuracy evaluationrdquo Pattern Recognition vol 35 no 7 pp 1617
ndash1635 2002 issn 0031-3203 doi 101016S0031- 3203(01)00126- 1 [On-
line] Available http www sciencedirect com science article pii
S0031320301001261 (cit on p 14)
[25] H J Chen J Zhang D J Lv and J Fang ldquo3-D shape measurement by composite
pattern projection and hybrid processingrdquo Optics Express vol 15 p 12 318 2007
doi 101364OE15012318 (cit on p 14)
74 Bibliography
[26] O D Faugeras and G Toscani ldquoThe calibration problem for stereordquo in Proceed-
ings CVPR rsquo86 (IEEE Computer Society Conference on Computer Vision and
Pattern Recognition Miami Beach FL June 22ndash26 1986) ser IEEE Publ86CH2290-
5 IEEE 1986 pp 15ndash20 (cit on p 14)
[27] G Toscani Systemes de calibration et perception du mouvement en vision ar-
tificielle Institut de recherche ne informatique et en automatique 1987 isbn
9782726105726 [Online] Available http books google nl books id =
Rrz5OwAACAAJ (cit on p 14)
[28] J Mas and I i A Universitat de Girona Departament drsquoElectronica An Approach
to Coded Structured Light to Obtain Three Dimensional Information[ ser Tesis
doctorals Universitat de Girona Universitat de Girona 1998 isbn 9788495138118
[Online] Available httpbooksgooglenlbooksid=mmM5twAACAAJ (cit on
p 15)
[29] R Tsai ldquoA versatile camera calibration technique for high-accuracy 3D machine
vision metrology using off-the-shelf tv cameras and lensesrdquo Robotics and Automa-
tion IEEE Journal of vol 3 no 4 pp 323ndash344 Aug 1987 issn 0882-4967 doi
101109JRA19871087109 [Online] Available httpdxdoiorg101109
JRA19871087109 (cit on p 15)
[30] J Weng P Cohen and M Herniou ldquoCamera calibration with distortion mod-
els and accuracy evaluationrdquo Pattern Analysis and Machine Intelligence IEEE
Transactions on vol 14 no 10 pp 965 ndash980 Oct 1992 issn 0162-8828 doi
10110934159901 (cit on p 15)
[31] P Redert ldquoMulti-viewpoint systems for 3-D visual communicationrdquo Masterrsquos the-
sis Delft University of Technology Stevinweg 1 - 2628 CN Delft - The Netherlands
2000 (cit on pp 15 26)
[32] M Woo J Neider T Davis and D Shreiner OpenGL Programming Guide The
Official Guide to Learning OpenGL Version 12 3rd Boston MA USA Addison-
Wesley Longman Publishing Co Inc 1999 isbn 0201604582 (cit on p 25)
[33] L P Chew ldquoConstrained Delaunay triangulationsrdquo Algorithmica vol 4 no 1-4
pp 97ndash108 1989 [Online] Available httplinkspringercomarticle10
1007BF01553881 (cit on pp 25 26)
[34] M Desbrun M Meyer P Schroder and A H Barr ldquoImplicit fairing of irregu-
lar meshes using diffusion and curvature flowrdquo in Proceedings of the 26th annual
conference on Computer graphics and interactive techniques ser SIGGRAPH rsquo99
New York NY USA ACM PressAddison-Wesley Publishing Co 1999 pp 317ndash
324 isbn 0-201-48560-5 doi 10 1145 311535 311576 [Online] Available
httpdxdoiorg101145311535311576 (cit on p 30)
Bibliography 75
[35] F Vahid Embedded System Design A Unified HardwareSoftware Introduction
Wiley India Pvt Limited 2006 isbn 9788126508372 [Online] Available http
booksgooglenlbooksid=HloqCOqcHvoC (cit on p 31)
[36] S Dhadiwal Baid ldquoSingle-board computers for embedded applicationsrdquo Electron-
ics For You Tech Rep 2010 [Online] Available httpwwwefymagonline
compdfsingle-board-computers_aug10pdf (cit on p 32)
[37] M Roa Villescas ldquoThesis preparationrdquo Eindhoven University of Technology Tech
Rep Jan 2013 (cit on p 32)
[38] G Coley ldquoBeagleboard system reference manualrdquo BeagleBoard org December
p 81 2009 (cit on p 34)
[39] V G Reddy ldquoNEON technology introductionrdquo ARM Corporation 2008 (cit on
p 34)
[40] M Barberis and L Semeria ldquoHow-to MATLAB-to-C translationrdquo Catalytic Tech
Rep 2008 (cit on p 38)
[41] W Von Hagen The definitive guide to GCC Apress 2006 (cit on p 45)
[42] I Stephenson Production rendering design and implementation Springer 2005
(cit on p 46)
[43] G Bradski and A Kaehler Learning OpenCV Computer vision with the OpenCV
library Orsquoreilly 2008 (cit on p 50)
[44] S Rippa ldquoMinimal roughness property of the Delaunay triangulationrdquo Computer
Aided Geometric Design vol 7 no 6 pp 489ndash497 1990 [Online] Available
httpwwwsciencedirectcomsciencearticlepii016783969090011F
(cit on p 51)
[45] ARM ldquoCortex-a series version 30 programmerrsquos guiderdquo Tech Rep 2012 (cit on
p 54)
[46] N Pipenbrinck ldquoARM NEON optimization an examplerdquo Tech Rep 2009 (cit
on p 54)