A Biophysical 3D Morphable Model of Face Appearanceopenaccess.thecvf.com/content_ICCV_2017_workshops/papers/... · 2017-10-20 · A Biophysical 3D Morphable Model of Face Appearance

A Biophysical 3D Morphable Model of Face Appearance

Sarah Alotaibi and William A. P. Smith

Department of Computer Science, University of York, UK

{ssma502,william.smith}@york.ac.uk

Abstract

Skin colour forms a curved manifold in RGB space. The

variations in skin colour are largely caused by variations

in concentration of the pigments melanin and hemoglobin.

Hence, linear statistical models of appearance or skin

albedo are insufficiently constrained (they can produce im-

plausible skin tones) and lack compactness (they require ad-

ditional dimensions to linearly approximate a curved man-

ifold). In this paper, we propose to use a biophysical model

of skin colouration in order to transform skin colour into

a parameter space where linear statistical modelling can

take place. Hence, we propose a hybrid of biophysical and

statistical modelling. We present a two parameter spec-

tral model of skin colouration, methods for fitting the model

to data captured in a lightstage and then build our hybrid

model on a sample of such registered data. We present face

editing results and compare our model against a pure sta-

tistical model built directly on textures.

1. Introduction

The quest to understand and model “face space” dates

back to the 1980s. A universal face model, capable of de-

scribing any human face in all its detail would have appli-

cation in many areas. Faces are key to realistic animation

and visual effects, their dynamics provide a natural means

for interaction and they form the most familiar and accessi-

ble biometric. Many disciplines besides computer science

study faces. For example, psychologists want to understand

how humans represent and recognise faces; surgeons want

to detect deviations from facial growth norms and plan sur-

gical interventions to correct abnormalities.

It is not surprising then that faces are the most well stud-

ied object in computer vision and graphics and arguably

also in statistical modelling and machine learning. The

state-of-the-art in face capture [13] allows measurement of

very high resolution texture (diffuse/specular albedo) and

shape information that can be used for photorealistic ren-

dering (note however that even albedo maps are not truly

intrinsic properties of the face since they are a function of

camera spectral sensitivities and the spectral power distribu-

tion of the illumination). On the other hand, face modelling

(i.e. building parametric models that can generalise to novel

face appearances) has failed to keep pace with the quality of

data that can be captured from real faces.

Clearly faces are not arbitrary objects with arbitrary ap-

pearance. They are composed of bone, muscle and skin with

a spatially-varying distribution of pigmentation and facial

hair. These biophysical components give rise to appear-

ance in well-understood ways. For example, skin appear-

ance forms a curved manifold in colour space [7] and hence

any linear warp between valid skin colours will result in im-

plausible skin tones. Our hypothesis is that neglecting these

causal factors leads to models that can produce implausi-

ble instances whilst not making best use of the training data

available. In almost all previous work, face appearance is

treated as a black box and face appearance models are learnt

using generic machine learning tools such as PCA [4,8,10]

or deep learning [24].

In this paper, we present methods for constructing mod-

els of face appearance that are a hybrid of principled bio-

physical modelling and statistical learning. Specifically, we

propose a biophysical, spectral model of skin colouration

and then perform learning (in this case simply PCA) within

the parameter space of this model. The result is a nonlin-

ear model that is guaranteed to produce only biophysically

plausible skin colours and is more compact than models ob-

tained by applying linear methods directly to the raw data.

In other words, we use a model-based transformation which

provides a new space in which a linear model better approx-

imates the data. This shares something in common with

Kernel PCA [32] however in our case the transformation to

the feature space can be performed explicitly and the trans-

formation itself is biophysically motivated. We build a hy-

brid model on data collected in a lightstage, demonstrate

biophysical editing results and compare our model statisti-

cally against a PCA model built directly on RGB textures.

2. Related Work

The realistic rendering of faces has been an objective for

several decades in the computer graphics community. As

824

such, numerous models of light interaction with skin have

been developed. The most sophisticated parametric models

of skin reflectance [23] use biophysically meaningful pa-

rameters and model in detail the behaviour of subsurface

scattering within the layers of the skin. Such models are

highly complex to evaluate and as such are not suitable for

face analysis tasks.

The two dominant approaches to modelling the appear-

ance of faces are statistical models [4, 8–10] and biophysi-

cal models [7, 18, 22, 23, 27, 29, 30]. These two areas have,

however, been almost entirely divergent. Statistical mod-

els are predominantly used in computer vision because they

provide constraints and present a robust way in which to

analyse an image. Biophysical models have been popu-

lar in computer graphics and medical imaging as they pro-

vide physically meaningful parameters and produce a real-

istic simulation of the light interaction with the skin, which

leads to a realistic face image. The idea of a hybrid model

has received very limited attention. Recently, success in

photorealistic face synthesis from photos has been achieved

by combining a dictionary of high resolution textures with

deep learning [31].

Statistical Face Modelling Popular early methods us-

ing statistical models for face shape and appearance were

the Point Distribution Model (PDM), Active Shape Model

(ASM) and Active Appearance Model (AAM), all devel-

oped by Cootes et al. [8–10]. In PDM and ASM, the shape

of each image in the training dataset is represented by a

set of landmark points which are modelled using PCA after

Procrustes alignment. In AAM, shape variation and inten-

sity information are combined into a single statistical ap-

pearance model. A new image can be interpreted through

fitting optimisation techniques by minimising the difference

between the new image and the image synthesised by AAM.

Blanz and Vetter [4] introduced the first parametric sta-

tistical model for textured 3D face analysis and synthesis.

Again, linear PCA was used, this time to build models of

dense 3D shape and per-vertex colours. Besides the linear-

ity assumption, the other weakness is that the textures used

to build the colour model are not diffuse albedo and so are

dependent upon the lighting and viewpoint under which the

data was captured.

More recently, nonlinear statistical modelling techniques

have been applied to modelling face shape and appearance.

For example, Bolkart and Wuhrer [5] build multilinear mod-

els of 3D face shape and Nhan et al. [24] use deep Boltz-

mann machines to learn 2D face appearance models.

Biophysical Skin Modelling Modelling the appearance

of human skin is still a challenging problem due to the op-

tical complexity of skin properties. Small variations in skin

colour significantly influence a person’s appearance, which

conveys information about their biophysical state such as

their health, ethnicity and age.

Claridge and co-workers [7, 29, 30] followed a line of

work in which a two or three parameter model based on

Kubelka-Munk theory was combined with a calibrated cam-

era (usually with a near infrared channel in addition to

RGB) in order to measure skin parameters. Their goal was

robust, non-contact measurement of parameters for use in

medical imaging applications so their model was relatively

simple. However, it does not account for subsurface scatter-

ing, specular reflectance or variation in surface geometry.

One line of investigation [30] was to show how to select op-

timal multispectral filters to maximise the accuracy of the

parameter estimates.

In graphics, far more sophisticated models have been

considered. The earliest work in computer graphics that fo-

cused on light scattering in skin was carried out by Han-

rahan and Krueger [15], in which they produced a Bidi-

rectional Reflectance Distribution Functions (BRDF) skin

model using single scattering of light and diffusion. Krish-

naswamy and Baranoski [23] proposed the BioSpec model

to simulate the interaction of light with the five layers of hu-

man skin. A brute-force Monte Carlo method is applied to

simulate the scattering on the skin model, which makes the

model significantly more costly and very difficult to invert

compared with other diffusion methods. Jimenez et al. [22]

sought to model dynamic effects such as changes in blood

flow caused by expressions.

More recently, some efforts have been made to develop

a predictive skin model in the hyperspectral domain to in-

vestigate the effect of skin spectral signatures. Chen et

al. [6] introduced a novel hyperspectral skin appearance

model named HyLIoS, “Hyperspectral Light Impingement

on Skin”, based on a first-principles and simulation. This

model is able to simulate the spatial and spectral distribu-

tion of all interacting light absorbers and scatterers within

the cutaneous tissues in three domains, visible, ultraviolet

and infra-red.

Hybrid models There have been very few attempts to

combine statistical and biophysical models. To our knowl-

edge, the following studies are the only previous works

undertaken to build a combined statistical and biophysi-

cal model. Tsumura et al. [35] used a statistical method

called independent component analysis (ICA) to extract two

chromatic components, hemoglobin and melanin pigments,

which represent different colour components of normal hu-

man skin present in a single skin colour image. However,

This work did not address shading on the face caused by

directional light, and there was no biophysical model.

The work of Jimenez et al. [22] can be viewed as using

a hybrid model. They used a very simple statistical model

based on local histogram-matching to compute the distribu-

825

Epidermis: modelled by the Lambert-Beer law

Dermis: modelled by Kueblka-Munk Reflection

Incoming Light Remitted Light

Figure 1. The layered skin refelectance model.

tions of hemoglobin and melanin over the face. Their work

mainly focused on capturing and rendering changes in skin

colour due to emotions and ignores long-term changes in

the skin or variation due to identity.

3. A Biophysical Model of Skin Colouration

In this section we propose a biophysical spectral model

for skin colouration. We take inspiration from a number

of previous models [7, 22, 23, 30] but adapt their ideas to

arrive at a novel model suited to our purposes. Specifically,

we seek a model with only two free parameters so that the

model can be fitted to colour RGB data (although we in

principle have 3 measurements per pixel, we can only solve

for two model parameters since we are always working with

an unknown scale factor). In addition, for reproducibility,

all physical quantities that we use are either from publicly

available measured data or previously validated functional

approximations (Tables 1 to 3).

Human skin has a complex layered structure. We sim-

plify this considerably by modelling only two layers (see

Figure 1 for a schematic diagram). The epidermis contains

the pigment melanin which absorbs some light and the re-

mainder is mostly forward scattered. Hence, we ignore re-

flections from the epidermis and assume all light is either

absorbed or forward scattered. Melanin mainly absorbs

light in the blue wavelengths and comes in two varieties.

Eumelanin is responsible for giving skin its black to dark

brown colour and pheomelanin its yellow to reddish brown

colour. The dermis contains blood that contains the pigment

hemoglobin. This absorbs light in the green and blue wave-

lengths and is responsible for giving skin its pinkish colour.

We model only backscattering and absorption in the dermis

and assume that any forward scattered light is absorbed by

deeper layers.

Our model depends on numerous biophysical parameters

that are fixed or variable scalars (shown in Table 1), wave-

length dependent quantities that are approximated function-

ally (Table 2) or wavelength dependent quantities that are

measured (Table 3). Note that the only two free parame-

ters in our model are fblood and fmel. We later use these

Parameter Description Value/range Source

Ceum eumelanin concentration 80.0 g/L [34]

Cphm pheomelanin concentration 12.0 g/L [34]

feum eumelanin blend ratio 61% [16]

Chem Hemoglobin concentration 150 g/L [11]

g gram molecular weight of hemoglobin 64,500 g/mol [20]

foxy oxy-hemoglobin ratio 75% [25]

depd thickness of epidermis 0.021 cm [1]

dpd thickness of papillary dermis 0.2 cm [1]

fblood blood volume fraction 2 - 7% [11, 19]

fmel melanosomes volume fraction 1 - 43% [20]

Table 1. Histological parameters of skin used in our model. Vari-

able parameters shown in the bottom two rows.

Parameter Description Function Source

eeum(λ) eumelanin molar extinction coefficient 6.6× 1011λ−3.33 [20]

ephm(λ) pheomelanin molar extinction coefficient 2.9× 1010λ−4.75 [20]

µsp.Mie(λ) Mie scattering 2× 105 · λ−1.5 [19]

µsp.Rayleigh(λ) Rayleigh scattering 2× 1012 · λ−4 [19]

µskinbaseline(λ) Baseline skin absorption coefficient 0.244 + 85.3 exp(−λ−15466.2 ) [20]

Table 2. Quantities with functional approximations.

Parameter Description Source

eoxy(λ) molar extinction coefficient of oxy-hemoglobin [28]

edoxy(λ) molar extinction coefficient of deoxy-hemoglobin [28]

Table 3. Measured quantities (all have units of L · mol−1 · cm−1).

to control the concentrations of hemoglobin and melanin in

spatially varying parameter maps.

3.1. Spectral Image Formation

To model skin colouration in a way that is independent

of scene lighting or the camera used to capture an image,

we need to work in the spectral domain. Hence, our bio-

physical colouration model relates biophysical parameters

to spectral reflectance. However, when fitting to RGB data,

we must integrate the spectral reflectance into colour values

and at this point must know or estimate the spectral power

distribution (SPD) of the light and the spectral sensitivity of

the camera. The spectral model for image formation is then

given by:

iC =

∫

∞

0

E(λ)SC(λ)R(λ)dλ, (1)

where E is the SPD of the light source, SC is the spectral

sensitivity of the camera in colour channel C ∈ {R,G,B},

R is the spectral reflectance of the material, λ is wavelength

and iC is the image intensity in colour channel C.

3.2. Epidermis: Lambert-Beer Law

In the epidermis layer, there is little backscattering and

all the light not absorbed by the melanin in this layer is di-

rectly forwarded to the dermis [7]. An appropriate model

for such an assumption is given by the Lambert-Beer law

[30]:

Tepidermis(λ) = e−µa.epidermis(λ), (2)

where µa.epidermis(λ) is the absorption coefficient of the epi-

dermis. This can be modelled as a convex combination of

826

absorption due to melanin and baseline absorption simply

due to the skin tissue:

µa.epidermis(λ) = fmelµa.mel(λ) + (1− fmel)µskinbaseline(λ),

where µa.mel(λ) is the absorption coefficient of melanin

which formed by a combination of the absorption coeffi-

cient of eumelanin µa.eum and pheomelanin µa.phm:

µa.mel(λ) = feumµa.eum(λ) + (1− feum)µa.phm(λ),

where the absorption coefficients are given by µa.eum(λ) =eeum(λ)depdCeum, and µa.phm(λ) = ephm(λ)depdCphm.

3.3. Dermis: Kubelka-Munk Reflection

Kubelka-Munk theory is a simple model to compute re-

flectance and transmission for layered surfaces with high

scattering [17]. In our model, we use Kubelka-Munk the-

ory to model reflection from the dermis layer. Any light not

reflected is assumed not to be remitted. The proportion of

light that is remitted from a layer is given by:

Rdermis(λ)=(1− β(λ)2)(eK(λ)dpd − e−K(λ)dpd)

(1+β(λ)2)eK(λ)dpd−(1−β(λ))2e−K(λ)dpd

where dpd is the thickness of the dermis, k(λ) ∝µa.dermis(λ) model absorption, s(λ) ∝ µsp.dermis(λ) models

scattering,

K(λ) =√

k(λ)(k(λ) + 2× s(λ)) and

β(λ)2 =k(λ)

k(λ) + 2× s(λ).

As for the epidermis, the absorption coefficient for the

dermis is a convex combination of baseline absorption and

absorption by the medium contained within the layer (in this

case blood):

µa.dermis(λ) = fbloodµa.blood(λ)+ (1− fblood)µskinbaseline(λ).

The absorption coefficient of blood is given by a convex

combination of the absorption coefficients of oxygenated,

µoxy(λ), and de-oxygenated, µdoxy(λ), hemoglobin:

µa.blood(λ) = foxyµoxy(λ) + (1− foxy)µdoxy(λ).

The absorption coefficients can be computed from the mea-

sured molar extinction coefficients using:

µoxy(λ) =2.303× eoxy(λ)× Chem

g,

µdoxy(λ) =2.303× edoxy(λ)× Chem

g.

µsp.dermis(λ) is the reduced scattering coefficient of the

dermis which we approximate as a combination of Mie and

Rayleigh scattering:

µsp.dermis(λ) = µsp.Mie(λ) + µsp.Rayleigh(λ).

400 450 500 550 600 650 700 750

wavelength (nm)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tota

l re

flecta

nce

fmel

=0.02, fblood

=0.05

fmel

=0.06, fblood

=0.05

fmel

=0.10, fblood

=0.05

400 450 500 550 600 650 700 750

wavelength (nm)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tota

l re

flecta

nce

fmel

=0.02, fblood

=0.02

fmel

=0.02, fblood

=0.039

fmel

=0.02, fblood

=0.05

(a) (b)

Figure 2. Reflectance spectra predicted by our model with (a)

varying melanin concentration and fixed hemoglobin; (b) varying

hemoglobin concentration and fixed melanin.

3.4. Layered Skin Reflectance Model

Our model uses the Lambert-Beer law for transmis-

sion through the epidermis and the Kubelka-Munk theory

for reflection from the dermis and another application of

Lambert-Beer law for light exiting back through the epider-

mis. Therefore, our complete model is given by:

Rtotal(fmel, fblood, λ) = Tepidermis(fmel, λ)2Rdermis(fblood, λ).

Note that the first term is squared because the light is trans-

mitted through the epidermis twice.

Figure 2(a) shows the total spectral reflectance of

our model for three different melanin concentrations

(2%, 6%, 10%) and constant concentration of hemoglobin

(5%). As the melanin increases the overall reflectance de-

creases. Figure 2(b) shows the total spectral reflectance for

constant concentration of melanin (2%) with three different

hemoglobin concentrations (2%, 3.9%, 5%). In general, our

model predicts that reflectance is greater in the red wave-

lengths and declines towards the blue wavelengths. It also

predicts the characteristic “W” shape around 550nm. Qual-

itatively, the shape of the spectra predicted by our model

appears similar to previous biophysical models [7].

3.5. Wavelength-Discrete Model

In practice, we discretise (1) over wavelength. Hence,

light source SPD, camera spectral sensitivities and spectral

reflectance are discretised for a fixed set of wavelengths. We

model wavelength from 400 to 720nm at 10nm increments

λ = [400, 410, 420, .., 720] ∈ Rn, n = 33. The discrete

skin colour model is therefore given by:

r(fmel, fblood) =

Rtotal(fmel, fblood, λ1)...

Rtotal(fmel, fblood, λn)

. (3)

If we have discrete approximations to E, SR,SG and SB

stored as vectors e, sR,sG and sB of length n, then an RGB

827

0

1

0.05

0.8

0.1

Blue

0.15

0.6

Green

0.5

Red

0.2

0.4

0.20 0

Melanin43%1%

2%

7%

Hemoglobin

(a) (b)

Figure 3. Visualisations of our biophysical skin colour model as (a)

a manifold in RGB space, (b) a colour image (non-uniform axes

to enable better visualisation of different skin colours and white

balancing has been applied).

colour value can be computed from the discrete model as:

iR(fmel, fblood, e, sR) =

n∑

j=1

ejsR,jrj(fmel, fblood)

= eT diag(sR)r(fmel, fblood),

similarly for iG and iB . Our model is formed by concate-

nating the three colour channels into a single vector:

i(fmel, fblood, e,S) =

iR(fmel, fblood, e, sR)iG(fmel, fblood, e, sG)iB(fmel, fblood, e, sB)

(4)

We can visualise the range of skin colours predicted by

our model. In Figure 3 we vary the two parameters over

their plausible ranges (i.e. the melanosomes and blood

volume fraction), transform to RGB space using the light

source SPD and camera sensitivities described in Section

4.3 and plot as (a) a manifold in RGB space and (b) a colour

visualisation. Note that the model predicts a smooth, curved

manifold in RGB space, implying that skin colour is a non-

linear entity. Again, qualitatively, the shape of our RGB

colouration model agrees with previous work [7].

4. Biophysical Model Fitting

We build our statistical model from 3D meshes aug-

mented by texture maps containing diffuse albedo estimates

(see first panel of Figure 4). In order to do this, we must first

establish dense correspondence between captured samples

and transform the albedo maps into the biophysical param-

eter space by inverse rendering. The fitting pipeline from a

raw captured mesh to the inverse rendered parameter maps

in a normalised texture space is shown in Figure 4. Each

step of this pipeline is described in the following subsec-

tions.

4.1. 3D Face Model Fitting

To establish correspondence between faces, we propose

a simple but efficient method to fit a deformable template

to each 3D mesh. Our approach is a 3D extension of the

2D fitting method proposed by Bas and Smith [3]. Specifi-

cally, we fit a 3D morphable model (3DMM). A 3DMM is

a deformable mesh whose vertex positions, s(α), are deter-

mined by the shape parameters α ∈ RS . Shape is described

by a linear subspace model learnt from a sample of faces

using PCA (we use the Basel Face Model (BFM) [26] com-

prising 53,490 vertices). So, the shape of any face can be

approximated as: s(α) = Qα + s, where Q ∈ R3N×S

contains the S retained principal components, s ∈ R3N is

the mean shape and the vector s(α) ∈ R3N contains the co-

ordinates of the N vertices, stacked to form a long vector:

s = [u1 v1 w1 . . . uN vN wN ]T. Hence, the ith vertex is

given by: vi = [s3i−2, s3i−1, s3i]T.

Suppose that we have L correspondences between our

data, x1, . . . ,xL (xi = [xi, yi, zi]T ), and the 3DMM (we

explain later how these correspondences are obtained in

practice). Without loss of generality, we assume that the ith

data point corresponds to the ith vertex in the morphable

model. Fitting the model amounts to estimating the pose

(rotation, translation and scale) and shape parameters that

minimise the error between model and data:

ε(r, t, s,α) =

L∑

i=1

‖xi − s(R(r)Qiα+ si) + st‖2, (5)

where R(r) ∈ R3×3 is a rotation matrix computed from the

axis-angle vector r ∈ R3, s is scale and t ∈ R

3 a transla-

tion. The residuals are linear in α and t and nonlinear in r

and s. Hence, (5) can be written in separable nonlinear least

squares (SNLS) form [14] as

ε(r, t, s,α) =

∥

∥

∥

∥

A(r, s)

[

α

t

]

− y(r, s)

∥

∥

∥

∥

2

(6)

where A(r, s) ∈ R3L×S+3 is given by

A(r, s) = s[

(IL ⊗R(r))QL 1L ⊗ I3]

,

and y(r, s) ∈ R3L is given by

y(r, s) = s [(IL ⊗R(r)) s]−[

x1 y1 z1 . . . zL]T

.

We use ⊗ to denote the Kronecker product. Note that this

objective is exactly equivalent to the original one. The opti-

mal solution to (6) in terms of the linear parameters is:

[

α∗

t∗

]

= A+(r, s)y(r, s) (7)

where A+(r, s) is the pseudoinverse. Substituting (7) into

(6) we get an equivalent objective to (5) but which depends

only on the nonlinear parameters:

ε(r, s) =∥

∥A(r, s)A+(r, s)y(r, s)− y(r, s)∥

∥

2. (8)

828

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07Rawscan Fi*edtemplate Religh3ng Inverserenderedparametermaps

Melanin Hemoglobin

Figure 4. Pipeline for fitting the biophysical model to captured data. From left to right: mesh and diffuse albedo map captured in a

lightstage [33]; the fitted template mesh and albedo map in normalised texture space; relighting of the fitted template; inverse rendered

melanin and hemoglobin concentration maps.

This is a nonlinear least squares problem of very low di-

mensionality ([r s] is only 4D) that can be solved with

Gauss-Newton minimisation or similar methods. In prac-

tice, SNLS formulations can be solved more efficiently than

general least squares problems and may converge when the

original problem would diverge [14].

To establish correspondence between the model and

data, we alternate between fitting and updating correspon-

dences in a non-rigid, trimmed ICP fashion. We initialise

using 20 manually labelled landmarks. We update corre-

spondences using nearest neighbour search, retaining corre-

spondences for the 80% of model vertices with the closest

matches. See the second panel in Figure 4 for an example

of a fitted 3D template.

4.2. Texture space normalisation

Having fitted a 3DMM to a 3D face mesh, we now warp

the albedo map stored in the original UV texture space into

the texture space of the 3DMM. The BFM is not supplied

with a UV embedding so we use the texture embedding of

Bas et al. [2]. This is based on a Tutte embedding [12] of

the mean shape and is symmetric.

For each pixel in the warped texture, we compute the

barycentric coordinates in the template mesh (in UV space)

and then interpolate the colour in the original texture using

the barycentric coordinate transformed back into a Carte-

sian coordinate in the original texture space. This amounts

to a piecewise affine warp. This only requires us to spec-

ify the resolution of the warped texture, for which we use

1024× 1024. The second panel in Figure 4 shows a diffuse

albedo map warped to the normalised texture space. Note

that this establishes dense correspondence between albedo

maps of different subject, enabling the subsequent statisti-

cal modelling. With shape and albedo to hand, the mesh is

relightable and we show a rendering of the fitted template

in the third panel of Figure 4.

4.3. Calibration

Our diffuse albedo maps are captured in a lightstage [33].

Hence, they directly measure skin colour and factor out dif-

fuse shading and specular reflectance. We consider any

400 500 600 700

Wavelength (nm)

0

0.2

0.4

0.6

0.8

1

Rela

tive p

ow

er

400 500 600 700

Wavelength (nm)

0

0.2

0.4

0.6

0.8

1

Rela

tive s

ensitiv

ity

(a) (b)

Figure 5. (a) The measured SPD of our light source, (b) Camera

spectral sensitivites from [21].

residual shading from ambient occlusion effects to be small

enough to discount. Hence, in order to fit the biophysical

colouration model to the albedo maps we require estimates

of 1. the SPD of the light source (LEDs in the lightstage),

2. the camera spectral sensitivity and 3. a global colour

transformation that accounts for the unknown scale factor

between model and data as well as colour transformations

introduced by the polarising filter on the camera in the light-

stage that is not accounted for in the measured spectral sen-

sitivities. While it may be possible to estimate all three from

data, we choose to measure the first two and only estimate

the unknown transformation.

We measured the SPD of the light source using a cal-

ibrated spectroradiometer (model: B&W Tek BSR111E-

VIS). The spectrum of our light sources is plotted in Fig-

ure 5(a). Our texture maps are collected by a Nikon D200

camera and its spectral sensitivity is included in a public

database of measured sensitivities [21]. We show the mea-

sured sensitivities for our camera in Figure 5(b).

To estimate the overall unknown global colour transfor-

mation, we select a sample of diffuse albedo maps covering

a range of skin types. We assume that the colours in this

sample span the complete range that we expect our model

to produce. We then solve a nonlinear optimisation problem

to compute the 3 × 3 transformation matrix that minimises

the squared residual errors between model and data. Cor-

respondence between model colours and data are iteratively

updated using bidirectional nearest neighbours. By using

bidirectional NN, we ensure that the overlap between model

and data is maximised since we penalise model colours be-

829

ing far from their closest data colour and vice versa.

4.4. Inverse Rendering

With a calibrated model to hand, we propose a simple

but efficient method for inverse rendering biophysical pa-

rameters from RGB colours. We precompute a 2D lookup

table in which we sample over the allowable range of fblood

and fmel, compute a spectrum for each combination of pa-

rameters and then convert this to an RGB value using the

light source, camera and colour transform calibrations. The

visualisation in Figure 3(b) shows a (white balanced) visu-

alisation of the look up table. Inverse rendering is then per-

formed by a nearest neighbour lookup between a data RGB

value and the RGB values in the lookup table.

Our biophysical model is only able to characterise skin

colours. So, other facial features such as facial hair and

eyes are not well explained. For this reason, for each pixel

we also store an RGB offset between the model best fit and

the actual colour. This enables us to recreate features not

well explained by our biophysical model.

5. Statistical modelling

To build our hybrid model, we transform a diffuse albedo

map of M RGB pixels into a vector:

x =

fblood

fmel

δRδGδB

∈ R5M , (9)

where fblood, fmel, δR, δG, δB ∈ RM are vectors containing

the inverse rendered hemoglobin and melanin concentra-

tions and the RGB offsets respectively. A linear model in

the parameter space is guaranteed to produce colours lying

on the skin colour manifold and any non-zero offsets are as-

sumed to explain non-skin features. We build a PCA model

on these transformed features such that any parameter vec-

tor can be approximated by:

x ≈ Pβ + x, (10)

where P ∈ R5M×K contains the K retained principal

components, x ∈ R5M is the mean parameter vector and

β ∈ RK is the hybrid model parameters.

To reconstruct an albedo map A ∈ RM×3 from a param-

eter vector, we first reconstruct the feature vector using (10),

then use the forward biophysical model to compute colours:

Aj = i(fmel,j , fblood,j , e,S) +

δR,j

δG,j

δB,j

. (11)

Since the biophysical model is nonlinear, the hybrid model

as a whole is nonlinear.

Hemoglobin×2

Melanin×1.5

Figure 6. Biophysically-based image editing.

For comparison, we also build a linear PCA model di-

rectly on the RGB albedo values. This is equivalent to the

texture model used in a classical 3DMM such as [26].

6. Experimental Results

We begin by demonstrating an application of our bio-

physical skin colouration model and inverse rendering

pipeline. In Figure 6 we show results of biophysically-

based image editing. To do this we take two captured mod-

els of real faces and fit the shape model and resample the

albedo map as described in Sections 4.1 and 4.2. Render-

ing these normalised models gives the unedited appearance

shown on the left hand side of Figure 6. We then perform

biophysically-based editing by inverse rendering parameter

maps using the method described in Section 4.4, editing the

maps and then re-rendering new albedo maps. We show the

editing results on the right hand side of Figure 6. In the

top row, editing was performed by scaling the hemoglobin

map by 2×. This introduces a flushed appearance as if the

face were over heating. In the bottom row, editing was per-

formed by scaling the melanin map by 1.5×. This gives

the appearance of darker skin, as if the face had been sun-

tanned. In all images, the light source SPD, camera sensi-

tivities and white balancing used for rendering are identical.

We now evaluate a hybrid biophysical and statistical

model built using the method described in Section 5. We

train our model on 25 faces, as captured by Seck et al. [33].

For statistical modelling, we use a manually drawn mask

to exclude non-skin regions from the model. We show a

830

100

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

100

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Mean PC1

+3σ

-3σ

PC2 PC3

Melanin

Hemoglobin

Figure 7. Visualisation of the mean and first three principal components of a biophysical 3D morphable model.

5 10 15 20

Number of model dimensions

0.4

0.5

0.6

0.7

0.8

0.9

1

Cum

ula

tive v

ariance c

aptu

red

Hybrid

Linear

5 10 15 20

Number of model dimensions

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Genera

lisation e

rror

HybridLinear

(a) (b)

Figure 8. (a) Model compactness, (b) model generalisation.

visualisation of the mean and principal components of our

model in Figure 7 (the linear model is shown in supplemen-

tary material for comparison). In each case we show the

melanin (top) and hemoglobin (bottom) map and a render-

ing of the resulting appearance on the mean shape. It is

clear that the principal components are capturing distinct

skin types. The first component captures the difference be-

tween dark, melanin rich skin and pinkish skin with very

little melanin. The second component captures reddish skin

versus pale, whiteish skin. Component three in the nega-

tive direction captures rosy cheeks, i.e. high hemoglobin

concentration in the cheek region.

We now consider two quantitative measures of model

quality and compare to a linear model built directly on RGB

colours. In Figure 8(a) we show compactness. The hybrid

model captures more of the cumulative variance for all num-

bers of model dimensions. In Figure 8(b) we show generali-

sation. This is computed using leave-one-out and shows the

RMS error in RGB space averaged over all samples. The

generalisation of the linear model is better here. Our pre-

diction is that this is at a cost of worse specificity, i.e. the

linear model can explain more of the space but some of this

will not correspond to plausible skin colours.

7. Conclusions

This work has presented a first attempt at constructing

a hybrid biophysical and statistical model of face appear-

ance. We have shown in principle that it is possible, that

the model has attractive properties in terms of compactness,

biophysical editing and capturing meaningful variations in

skin type. However, there are many limitations to this work

and it should be seen as only a first step in this direction.

First, acquiring the necessary data and calibration infor-

mation required to build such a model is highly complex.

For this reason, our training set size was very small and so

we could not meaningfully evaluate specificity (where we

expect a biophysically-constrained model to outperform a

model built directly on colours). With additional data we

could also investigate building dynamic models, as in [22].

Second, we have no model for the appearance of eyes or fa-

cial hair, both important aspects of face appearance. Third,

we have only shown how to fit our model to data cap-

tured in controlled conditions where we have access to a 3D

mesh, diffuse albedo map and camera/light source spectra

are known. In future work, we intend to investigate how our

model could be fitted directly to uncontrolled data such as

2D images with no calibration information. Fourth, we have

modelled appearance independently from shape. There are

likely to be correlations so a joint model could be more ef-

ficient. Finally, in a more ambitious direction, we note that

our biophysical model is differentiable and that spectral im-

age formation can be viewed as a convolution between re-

flectance and camera/light source spectra. Hence, it may

be possible to incorporate our model into a convolutional

neural network that learns to estimate biophysical parame-

ters directly from 2D images and train it in an unsupervised

fashion by using our forward model to compute an appear-

ance loss.

831

References

[1] R. R. Anderson and J. A. Parrish. The optics of human skin.

Journal of Investigative Dermatology, 77(1):13 – 19, 1981.

3

[2] A. Bas, P. Huber, W. A. P. Smith, M. Awais, and J. Kittler.

3D morphable models as spatial transformer networks. arXiv

preprint arXiv:1708.07199, 2017. 6

[3] A. Bas and W. A. P. Smith. What does 2D geometric infor-

mation really tell us about 3D face shape? arXiv preprint

arXiv:1708.06703, 2017. 5

[4] V. Blanz and T. Vetter. A morphable model for the synthesis

of 3D faces. In Proc. SIGGRAPH, pages 187–194, 1999. 1,

2

[5] T. Bolkart and S. Wuhrer. A robust multilinear model learn-

ing framework for 3D faces. In Proc. CVPR, pages 4911–

4919, 2016. 2

[6] T. F. Chen, G. V. G. Baranoski, B. W. Kimmel, and E. Mi-

randa. Hyperspectral modeling of skin appearance. ACM

Trans. Graph., 34(3):31:1–31:14, May 2015. 2

[7] E. Claridge, S. Cotton, P. Hall, and M. Moncrieff. From

colour to tissue histology: Physics-based interpretation of

images of pigmented skin lesions. Medical Image Analysis,

7(4):489 – 502, 2003. 1, 2, 3, 4, 5

[8] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active ap-

pearance models. IEEE Trans. Pattern Anal. Mach. Intell.,

23(6):681–685, 2001. 1, 2

[9] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham.

Training models of shape from sets of examples. In Proc.

BMVC, pages 9–18, 1992. 2

[10] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Ac-

tive shape models-their training and application. Computer

vision and image understanding, 61(1):38–59, 1995. 1, 2

[11] R. Flewelling. Noninvasive Optical Monitoring. In The

Biomedical Engineering Handbook, Second Edition. 2 Vol-

ume Set, Electrical Engineering Handbook. CRC Press, dec

2000. 3

[12] M. S. Floater. Parametrization and smooth approxima-

tion of surface triangulations. Comput. Aided Geom. Des.,

14(3):231–250, 1997. 6

[13] A. Ghosh, G. Fyffe, B. Tunwattanapong, J. Busch, X. Yu,

and P. Debevec. Multiview face capture using polar-

ized spherical gradient illumination. ACM Trans. Graph.,

30(6):129, 2011. 1

[14] G. Golub and V. Pereyra. Separable nonlinear least squares:

the variable projection method and its applications. Inverse

problems, 19(2), 2003. 5, 6

[15] P. Hanrahan and W. Krueger. Reflection from layered sur-

faces due to subsurface scattering. In Proc. SIGGRAPH,

pages 165–174, 1993. 2

[16] A. Hennessy, C. Oh, B. Diffey, K. Wakamatsu, S. Ito, and

J. Rees. Eumelanin and pheomelanin concentrations in hu-

man epidermis before and after uvb irradiation. Pigment Cell

Research, 18(3):220–223, 2005. 3

[17] T. Igarashi, K. Nishino, and S. K. Nayar. The appearance of

human skin: A survey. Foundations and Trends in Computer

Graphics and Vision, 3(1):1–95, 2007. 4

[18] J. A. Iglesias-Guitian, C. Aliaga, A. Jarabo, and D. Gutierrez.

A biophysically-based model of the optical properties of skin

aging. Computer Graphics Forum, 34(2):45–55, 2015. 2

[19] S. L. Jacques. Origins of tissue optical properties in the UVA,

visible, and NIR regions. OSA TOPS on advances in optical

imaging and photon migration, 2:364–369, 1996. 3

[20] S. L. Jacques. Skin optics summary. Oregon Medical Laser

Center News, January 1998. 3

[21] J. Jiang, D. Liu, J. Gu, and S. Susstrunk. What is the space

of spectral sensitivity functions for digital color cameras? In

Proc. WACV, pages 168–179, 2013. 6

[22] J. Jimenez, T. Scully, N. Barbosa, C. Donner, X. Alvarez,

T. Vieira, P. Matts, V. Orvalho, D. Gutierrez, and T. Weyrich.

A practical appearance model for dynamic facial color. ACM

Trans. Graphic., 29(6):141:1–141:10, 2010. 2, 3, 8

[23] A. Krishnaswamy and G. V. Baranoski. A biophysically-

based spectral model of light interaction with human skin.

Computer Graphics Forum, 23(3):331–340, 2004. 2, 3

[24] C. Nhan Duong, K. Luu, K. Gia Quach, and T. D. Bui. Be-

yond principal components: Deep Boltzmann machines for

face modeling. In Proc. CVPR, pages 4786–4794, 2015. 1,

2

[25] P. A. Oberg. Optical sensors in medical care. Sensors Up-

date, 13(1):201–232, 2003. 3

[26] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, and T. Vet-

ter. A 3D face model for pose and illumination invariant face

recognition. In Proc. AVSS, 2009. 5, 7

[27] G. Poirier. Human skin modelling and rendering. Master’s

thesis, University of Waterloo, 2004. 2

[28] S. Prahl. Optical absorption of hemoglobin. Oregon Medical

Laser Center News, December 1999. 3

[29] S. Preece, S. Cotton, and E. Claridge. Imaging the pig-

ments of human skin with a technique which is invariant

to changes in surface geometry and intensity of illuminat-

ing light. In Medical Image Understanding & Analysis, page

P1103, 2003. 2

[30] S. J. Preece and E. Claridge. Spectral filter optimization

for the recovery of parameters which describe human skin.

IEEE Trans. Pattern Anal. Mach. Intell., 26(7):913–922, July

2004. 2, 3

[31] S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. Photorealistic

facial texture inference using deep neural networks. In Proc.

CVPR, 2017. 2

[32] B. Scholkopf, A. Smola, and K.-R. Muller. Kernel principal

component analysis. In International Conference on Artifi-

cial Neural Networks, pages 583–588. Springer, 1997. 1

[33] A. Seck, W. A. Smith, A. Dessein, B. Tiddeman, H. Dee,

and A. Dutta. Ear-to-ear capture of facial intrinsics. arXiv

preprint arXiv:1609.02368, 2016. 6, 7

[34] A. J. Thody, E. M. Higgins, K. Wakamatsu, S. Ito, S. A.

Burchill, and J. M. Marks. Pheomelanin as well as eume-

lanin is present in human epidermis. Journal of Investigative

Dermatology, 97(2):340 – 344, 1991. 3

[35] N. Tsumura, H. Haneishi, and Y. Miyake. Independent-

component analysis of skin color image. JOSA A,

16(9):2169–2176, 1999. 2

832

Documents

A Biophysical 3D Morphable Model of Face Appearanceopenaccess.thecvf.com/content_ICCV_2017_workshops/papers/... · 2017-10-20 · A Biophysical 3D Morphable Model of Face Appearance