Image Matting and Applications - University of Washingtonmorrow/336_14/papers/... · 2014. 6. 19. · 2.1 Graph Laplacian We have a non-looping, unweighted, and undirected graph

Image Matting and Applications

Dylan Swiggett

Abstract

Image matting is a practical and heavily applied technique in imagerecognition, useful both on its own and as an intermediate stage in imageand video processing. In this paper, I explain the basics of image mattingtheory, then discuss several specific techniques documentied in ScalableMatting: A Sub-linear Approach [7]. I conclude with a few applicationsof the techniques described, and demonstrate some results of my ownimplementation.

Contents

1 Introduction 21.1 Image Matting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Definitions 22.1 Graph Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Matting Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Iterative Methods and Relaxation . . . . . . . . . . . . . . . . . . 32.4 Interpolation and Restriction . . . . . . . . . . . . . . . . . . . . 4

3 Techniques 53.1 Example Matting Laplacian . . . . . . . . . . . . . . . . . . . . . 53.2 Gauss-Seidel Relaxation . . . . . . . . . . . . . . . . . . . . . . . 73.3 Multigrid Methods and the V-Cycle Algorithm . . . . . . . . . . 83.4 Alternate Approaches . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4.1 Conjugate Gradient Descent . . . . . . . . . . . . . . . . . 103.4.2 Multigrid Conjugate Gradient Descent . . . . . . . . . . . 11

4 Applications 114.1 Dehazing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Experiments 13

6 Conclusion 16

1

1 Introduction

1.1 Image Matting

Image matting is an extensive field, but in this paper I focus on foregroundand background extraction. Given an image I, we wish to produce a matte(a trasparency value α at each pixel, with αi ∈ [0, 1]) such that at each pixeli we can deconstruct the color value Ii into a sum of two samples, one from aforeground color Fi and one from a background color Bi. To produce the colorvalue at each pixel i in our original image, we then take

Ii = αiFi + (1− αi)Bi

This problem is constrained with a “sketch” by the user, often called a prior,indicating regions of the image which are known to be either in the foregroundor in the background. With enough such constraints, the problem can be suffi-ciently defined such that an accurate and useful matte is produced.

1.2 Topics Covered

Scalable Matting: A Sub-linear Approach, by Philip G. Lee and Ying Wu, doc-uments the steps of several modern image matting techniques, and comparesthem both analytically and practically. Lee and Wu provide a few differentmethods for each step of the matting algorithm, bit in this paper I focus inon specific examples. I give brief explanations of most algorithms tested, andfocus in particular on the Matting Laplacian from [8] and the V-Cycle algo-rithm detailed in [2] and [1]. I then give previews of a few of the applications ofthis method, before concluding with some experiments performed with my ownimplementation.

2 Definitions

2.1 Graph Laplacian

We have a non-looping, unweighted, and undirected graph. We take vi to be anindexing of its vertices. We first define the degree matrix of our graph, D, tobe a diagonal matrix where Dii is the degree of vi (the number of connectingedges). Next, we define the adjacency matrix of our graph, W , a symmetricmatrix where Wij = 1 if and only if there is a direct edge between vi and vj .Note that since our graph is non-looping, Wii = 0. The Graph Laplacian, L,of our graph is now defined as L = D −W . Diagonal elements then indicatehow strongly connected each vertex is to the rest of the graph, and off-diagonalelements are the negation of how strongly connected specific vertices are to eachother.

2.2 Matting Laplacian

When dealing with image matting, we can construct a graph such that eachpixel is a vertex and adjacent pixels share an edge. Such a graph is naturallyundirected and non-looping. However, a large part of image matting is deter-mining appropriate weights for the edges, so we cannot immediately construct

2

a Graph Laplacian. Instead, we use a variant of the Graph Laplacian called theMatting Laplacian [8]. Although identical in purpose, our D and W must beproduced differently from simple adjacency. The distance between pixels mustcome into play, but it is weighted by differences in color. For example, two redpixels a slight distance apart would likely be more “adjacent” than a red anda blue pixel directly next to each other. The way we produce these numbers iscalled the affinity function, and it has been defined in many different ways.We use the affinity function from [8], which is described in Techniques.

In general, we can solve for our values of α by minimizing the quadratic form

J(α) = αTLα =∑i,j

Wij(αi − αj)2

To develop an intuition for why this works, we consider the second of the formsabove. The (αi − αj)2 term is small in regions with relatively uniform trans-parency, and large where tranparency shifts suddenly. In contrast Wij term islarge where colors change slowly and small where colors change quickly. Thismeans that, if colors are changing slowly, we want the changes in α to be slowas well, or else we’ll have large terms add up. Then a good solution to the equa-tion makes regions of similar transparency align roughly with regions of similarcolor. Since this means that objects (generally of similar color) will be given aroughly fixed alpha, this result is desirable. Note that a trivial solution to thisequation is to have every term of α equal. This solution is prevented by ouruser constraints (the “sketch”), so that we can instead converge towards someminimal solution that has regions of high α and low α where the user desires.

We can minimize our quadratic form to a non-trivial solution by solving aconstrained sparse system of linear equations. To do this, we use LagrangeMultipliers and our sketch. We suppose that we have some vector of values, g,such that the values of gi are user constraints at each pixel, i.e. 1 where theuser has indicated a pixel is in the foreground, and 0 everywhere else. We thendefine a diagonal matrix C where Cii is 1 if the user has constrained pixel i,and 0 elsewhere. Taking the gradient of our quadratic form, and adding in oursketch, we now have the constrained system of linear equations

∇J(α) = Lα = 0, αC = g

From Lagrange’s method, we can now minimize our quadratic form J(α) byinstead solving

Lα+ γ (αC − g) = 0 or (L+ γC)α = γg

so we have our system of linear equations. By picking some large, positive γ, wecan then solve for our desired minimal solution under our strict constraints. Themajority of this paper is devoted to methods for efficiently solving this systemof equations for any given Matting Laplacian and sufficient constraints.

2.3 Iterative Methods and Relaxation

A common problem in computation, and one we will have to address for fore-ground/background matting, is the solution of large systems of linear equations.

3

We denote these as in [2] byAu = f

As shown later, the systems of linear equations involved are massive for evenreasonably sized images, and although direct solutions are possible, they aredifficult to approach both in theory [2, pg.4] and in practice [7]. Instead, wetake an iterative approach. We take an approximation of u, denoted v. Thisapproximation might be initially very rough, but through successive iterationsour approximation should converge towards u. To formalize this, we define theerror (e) and the residual (r) by

e = u− v r = f−Av

We then note that, in combination with our original system of equations, wehave that

Ae = r

This is known as the residual equation [2]. While the error is not immediatelyavailable unless the system of equations is already solved, the residual can becalculated at any intermediate step, so this gives us a first clue as to how aniterative step might be produced:

u = v + e = v +A−1r

Since A−1 is usually very difficult to compute, the first step in relaxation (thegeneral name for these iterative solutions) is to find some similar matrix. Wedenote the residual and approximate solution after n steps by r(n) and v(n), andrephrase our goal as finding a B ≈ A−1 or a matrix P and a vector g such that

v(n+1) = v(n) +Br(n) = Pv(n) + g, limn→∞

‖r‖(n) = limn→∞

‖e‖(n) = 0

These notations are mathematically equivalent, but going forward we will usethe P and g notation. Now, we can see that if some solution u does exist, itsatisfies

u = Pu + g

So we have that

e(1) = u− v(1) = (Pu + g)− (Pv(0) + g) = Pe(0)

Extending this through multiple iterations,

e(n) = Pne(0)∥∥∥e(n)

∥∥∥ ≤ ‖P‖n ∥∥∥e(0)∥∥∥

So our constraint to guarantee convergence is that the spectral radius of P(the maximum norm of its eigenvectors) is less than 1. We will demonstrate anexample for which this is true in Techniques.

2.4 Interpolation and Restriction

As we will later see, it is sometimes necessary to upsample or downsample amatrix. That is, we need to create higher or lower resolution versions of a ma-trix that resemble zoomed in our zoomed out verions of the original. This is

4

done by creating two operators, known as the interpolation operator, whichperforms upsampling (making a larger matrix), and the restriction operator,which performs downsampling (making a smaller matrix). How they are definedis generally context specific. Throughout this paper, the notation used in [7]and [2] will be used. An interpolation operator maps a grid of values spaced2h apart to a grid of values spaced h apart (with twice as many values bothvertically and horizontally). This operator is denoted by Ih2h. A restriction op-erator maps a grid of values spaced h apart to a grid spaced 2h apart (with halfas many on each dimension), and is similarly denoted by I2hh . As shorthand,wherever a value such as vi is used to denote the ith element of a grid, vhi willbe used to denote the ith element of the grid with spacing h. For example,Ih2hv

2h = vh and v2h = I2hh vh.

The most basic example of an interpolation operator is the linear interpola-tion operator, here presented for a vector of values vj :

vh2j = v2hj vh2j+1 =1

2

(v2hj + v2hj+1

)The restriction operator, which approximately reverses interpolation, is mosteasily calculated by v2jj = vh2j , known as injection. However, a weighted re-striction operator often performs better. For example:

v2hj =1

4

(vh2j−1 + 2vh2j + vh2j+1

)Although these operators have only been defined on a one dimensional grid,similar constructions can be used for n-dimensional grids. The choice of operatoris completely context dependent.

3 Techniques

3.1 Example Matting Laplacian

I here outline the Matting Laplacian (and derivation) performed in [8]. Al-though many alternate (and often simpler) affinity functions are documented,this one is both common and rigorous. This solution makes a few small sim-plifying assumptions. The first is that the foreground and background imagesare locally smooth. This leads to sudden changes in the image color being at-tributed to sudden changes in α, a reasonable expectation. The second is thatthe image is greyscale, which significantly simplifies calculations. For a versionthat supports colored images, see [8]. Given this second assumption, greyscalevalues will throughout this paper be treated as real numbers in the interval[0, 1], with 0 mapping to black and 1 to white.

From the smoothness assumption, we can locally treat our values of α as alinear function, conditioned on known values for F and B, the foreground andbackground images. Taking a = 1

F−B and b = − BF−B , we define our matte by

αi = aIi + b

for all values of i in a small section of the image. These values of a and bcan be justified by noting that Ii ∈ [Bi, Fi], and the chosen a and b produce

5

αi : [Bi, Fi]→ [0, 1], as expected.

Solving for F and B is now equivalent to solving for a and b. This is doneby minimizing a cost function that is chosen in [8] to be

J(α, a, b) =∑j∈I

∑i∈wj

(αi − ajIi − bj)2 + εa2j

with wj a small box around the pixel j, and ε is some small, positive, user-chosennumber. The last term of the sum serves to minimize aj where possible, andacts to both smooth the matte as a whole, and to resolve convergence whereresults might otherwise be indeterminate (e.g. a box where the image color isconstant). Since the minimal values in each box are determinate, and since theyoverlap at nearby pixels, this cost function allows local changes to propagatethrough their neighbors. Cost minimization is solved in [8] as follows.

Theorem 1. We define J(α) to be mina,b

J(α, a, b). Then

J(α) = αTLα

where L is an N ×N matrix with

Li,j =∑

k|(i,j)∈wk

(δij −

1

|wk|

[1 +

1ε|wk| + σ2

k

(Ii − µk)(Ij − µk)

])

where δij is the Kronecker delta (1 if i = j, 0 otherwise), µk and σk are themean and std. dev. of the color in box wk, and |wk| is the number of pixelsin wk. Note that our sum is over each small box (usually of fixed size) thatcontains both pixels i and j.

Proof. For each wk, we define the (|wk|+ 1)× 2 matrix Gk where the first |wk|rows are [Ii, 1] for each i ∈ wk, and the last row is [

√ε, 0]. Next we define the

(|wk| + 1) × 1 matrix αk where the first i rows are the values of αi for i ∈ wkand the last is 0. We can then rewrite J(α, a, b) as

J(α, a, b) =∑k

∥∥∥∥Gk [ akbk

]− αk

∥∥∥∥2We want to find the optimal values of a and b, a∗k, b

∗k for each k in our window,

but by least squares minimization this is now simply

(a∗k, b∗k) = argmin

∥∥∥∥Gk [ akbk

]− αk

∥∥∥∥ = (GTkGk)−1GTk αk

We now define Gk = I −Gk(GTkGk)−1GTk , and can rewrite

J(α) =∑k

αTk GTk Gkαk

Note that in this form, we need only minimize J in one variable rather thanthree. Finally, we expand our middle terms, GTk Gk, and find by algebra that

GTk Gk =

(δij −

1

|wk|

[1 +

1ε|wk| + σ2

k

(Ii − µk)(Ij − µk)

])

6

Plugging this back into our formula for J(α) and rewriting our sum as matrixcomposition, we have our desired equation.

L is our matting Laplacian. Note that, since we sum over more values ofk when two pixels are closer together, the weights on edges are dependent onboth color differences and distance apart. Further, when looked at in a slightlydifferent way, it is actually just the graph Laplacian with a different affinityfunction. This is as we stated previously. We define our adjacency matrix, W ,by weights rather than by simple edge existence, so that for i 6= j, Wij = Lij ,and Wii = 0. Then we define our degree matrix accordingly as a diagonal matrixwith

Dii =∑j

Wij

And our matting Laplacian is found by L = D −W , exactly as we did for agraph Laplacian.

3.2 Gauss-Seidel Relaxation

Two of the most common relaxation algorithms are the Jacobi method and theGauss-Seidel method. Both are of simple matrix form

v(n+1) = Pv(n) + g

but the Gauss-Seidel method is used both in [7] and in this paper. This choiceis made because the Gauss-Seidel method is easier to apply computationally [2]and converges about twice as fast as the Jacobi method [7]. We start, as withany other relaxation method, with the system of linear equations:

Au = f

where A is an N × N matrix. We start by partitioning A as a sum of twotriangular matrices, which we denote by L (lower) and U (upper).

L =

A00 0 0 · · · 0A10 A11 0 · · · 0A20 A21 A22 · · · 0

......

.... . .

...AN0 AN1 AN2 · · · ANN

U =

0 A01 A02 · · · A0N

0 0 A12 · · · A1N

0 0 0 · · · A2N

......

.... . .

...0 0 0 · · · 0

We can now return to our original equation, but rewritten as

(L+ U)u = f ⇒ Lu = −Uu + f⇒ u = −UL−1u + L−1f

This immediately resembles a relaxation formula, with P = −UL−1 and g =L−1f. Further, the matrix −UL−1 is computationally simple to multiply byforward-substitution. All that remains is to show convergence when applied toapproximate solutions. In fact, in this situation, ρ(A) ≤ 1−a/N2 < 1, for somepositive constant a [7]. Then we know that convergence will occur, althoughperhaps quite slowly. Unfortunately, slow convergence is usually the case. Todemonstrate this, an example used in both [7] and [2] is here supplied:

7

Example Consider a system of linear equations with boundaryvalues on the one dimensional grid u0, u1, . . . , uN :

−uj−1 + 2uj − uj+1 = 0 (1 ≤ j ≤ N − 1), u0 = uN = 0

The grid is exactly 1 wide, so the points are spaced h = 1/N apart.This problem is selected due to its trivial, unique solution:

u0 = u1 = · · · = uN = 0

And due to its simple A:

A =

1 2 1 0 0 · · · 00 1 2 1 0 · · · 00 0 1 2 1 · · · 0...

......

. . .. . .

. . ....

0 0 0 · · · 1 2 1

This matrix has eigenfunctions and eigenvalues

vh(k)i = sin(kπih)/h λhi = 2− 2 cos(πih)

Although this is a simplified model, such oscillatory eigenfunctionsare found in all graph Laplacians [7]. Similarly, when we produceour relaxation matrix P from Gauss-Seidel, its eigenvalues are

µhi = cos2(πih)

This does not bode well for convergence. We take v to be our ini-tial estimate for u, and as previously we define e = v − u. Fromour eigenfunctions, we now see that highly oscillatory components ofour e converge to 0 rapidly, but less oscillatory components convergeat a rate close to 1. Further, since our eigenfunctions are exactlythese oscillatory components, only the amplitude is damped, whilethe frequency is left unchanged. In practice, this means that aftera few iterations, the high frequency components vanish and the lowfrequency components are left behind, and require an immense num-ber of iterations to satisfactorily remove. This behavior is known assmoothing, and is an unfortunate property of most relaxing meth-ods [2].

Although we now have in the Gauss-Seidel method a convergant relaxationmatrix, another step is clearly needed to make its use computationally feasible(both in theory and in practice). No known iterative matting Laplacian solversavoid this behavior, so a workaround is necessary [7].

3.3 Multigrid Methods and the V-Cycle Algorithm

Multigrid methods were invented as a solution to poor convergence behavioron low frequency error components. The idea is to downsample the system oflinear equations, such that the low frequency modes which occur at high reso-lution become high frequency modes at coarse resolution, and hence converge

8

quickly. Fortunately, the mathematical basis built up for our system of equa-tions translates well into this framework. We refer back to the notation used inthe interpolation section, vh referring to v at resolution h, as this notation isused extensively in this section.

The general algorithm for a multigrid is to produce a high resolution error,then downsample it, correct as much error as possible (through iteration) atthis coarse level, then upsample and apply the error corrections to the higherlevel of detail. Naively, we can summarize this as starting with a good vh byfirst solving (or approximating) v16h or some variant, then applying our interpo-lation operator several times to produce our more detailed error. Unfortunately,this simple algorithm is not easy to apply when we are given an initial guess,v(0), at the highest level of resolution. However, the residual equation herecomes to the rescue:

Ae = r Aheh = rh

Our Ah and rh are known, and the initial guess for eh is 0, regardless of thev(0) chosen. Further, this is yet another system of linear equations, involvingsmaller values, and with a simpler convergent case. This leads us to the firstcandidate for a multigrid algorithm:

Nested Iteration [7]

1. Calculate rh from fh −Ahv(n)h(note that v(n)h is a combination of notations v(n) and vh).

2. Calculate r2h from the restriction operation I2hh rh.

3. Solve A2he2h = r2h for e2h by Nested Iteration with e(0)2h = 0.

4. Calculate v(n+1)h from v(n)h + Ih2he2h

This method is quite effective, but only if the error is relatively smooth. Namely,the interpolation operation at the final step must make a good approximationof the actual error. However, we now have an algorithm that preserves correc-tions to smooth error but not oscillatory error. Clearly, combining this with arelaxation method such as Gauss-Seidel could have good results.

This combination leads to what is known as the V-Cycle Method, so called forthe V-like shape in which it first restricts repeately, then interpolates repeatedly.Following is the pseudocode for this algorithm in [7]:

9

V-Cycle Pseuodocode

1: function VCYCLEH(uh,fh):

2: if h = H:

3: return final solution uh

4: Relax on Ahuh = fh

5: rh ← fh −Ahuh

6: e2h ← VCYCLEH(e2h = 0, I2hh rh)

7: uh ← uh + Ih2he2h

8: Relax on Ahuh = fh

9: return uh

In fact, this algorithm is guaranteed to converge in a fixed number of iterations(conditioned on n2) [5].

3.4 Alternate Approaches

[7] also spends some time discussing conjugate gradient descent (CG) and itsrecent variant, multigrid conjugate gradient descent, although it ultimatelyfinds that the V-Cycle algorithm performs much better than either of thesealternatives. Brief (and informal) explanations of these methods are given here.

3.4.1 Conjugate Gradient Descent

Gradient descent is another iterative method for solving systems of linearequations. The general idea is that we take each alpha value in our matte to bea component of a large vector. We then have some n-dimensional space, D, thatcontains all possible configurations of our alpha matte, one (or more) of whichwe want to find. We start with some guess vector, u, and at each iterationwe find a correction vector d such that u(k+1) = u(k) + d (where k indicatesthe iteration, as before) takes us closer to our desired solution. The problemis then finding our d. To do this, we note that the gradient of the system ofequations we are solving slopes towards local minima. Then by taking steps inthe direction of the gradient, we can approach our minimum with each step.

We now recall that our original problem, prior to creating a system of linearequations, is in fact the quadratic form J(α) = αTLα. Then solving our systemof linear equations is identical to minimizing this quadratic form. Further, thegradient of J(α) is just Lα, our residual. Since any linear optimization problemlike this can be rewritten similarly as minimizing a quadratic form, we can takeour correction vector for linear systems in general to be d = f−Au.

Conjugate gradient descent is a slight modification of this algorithm. Intraditional gradient descent, many steps might be taken in the same direction,and as such a large number of steps might be redundant. CG fixes this byrequiring that each step must be conjugate (A-orthogonal) to the previousstep. This means that⟨

u(k+1),u(k)⟩A

=(u(k+1)

)TAu(k) = 0

10

Then subsequent steps don’t make corrections in the same direction, so manyredundant steps are eliminated. CG converges at a much faster rate then normalgradient descent, as it has more information to work with at each iterative step.To produce the values of d for CG, an application of Gram-Schmidt orthoganal-ization using A-orthogonality rather than the normal inner product suffices. Formore information on CG, see [10].

This method is very common in optimization problems, and as such was triedon its own in [7]. However, its results were barely comparable to the V-Cyclealgorithm, generally taking at least an order of magnitude more iterations be-fore reasonable convergence. A multigrid variant of this technique had morepromising results.

3.4.2 Multigrid Conjugate Gradient Descent

Multigrid conjugate gradient descent is based on the idea that we canconverge more efficiently by taking into account the residuals of downscaledversions of our system of linear equations. As with our previous use of multi-grid methods, the goal is to remove low frequency components of our error atabout the same pace as high frequency components. Since the correction stepsfound at lower resolutions of our problem do more to address the low frequencyerror, we can take our correction step at each iteration to be a linear combina-tion of the correction steps from each resolution of our problem. This algorithmis explored in detail in [9].

Although [7] finds this method to be much closer in performance to V-Cyclethan standard CG, it still takes more iterations in general. As such, the conclu-sion of [7] is that V-Cycle is still the best option for solving the large systemsof linear equations necessary in image matting.

4 Applications

While image matting for foreground and background is useful as a standalonetechnique in image processing, it has garnered significant attention from re-searchers due to its far-reaching applications in other areas of graphics study.Several of these applications are discussed here.

4.1 Dehazing

Image dehazing and its similarities to image matting are explored in detail by[6]. The basic problem is that, in images taken over some distance, floatingparticles and moisture can lead to dulling of color for more distant objects.This can significantly reduce the quality of the image taken. Dehazing attemptsto determine how much this “haze” has altered the color at each pixel of animage, and to restore the color to what it would have been if unobscured. Thegeneral statement of the problem is to find a solution of the form

Ii = Jiti +Ai(1− ti)

As in image matting, I is the color value at each pixel of the original image, andt is a transparency value at each pixel. In the case of dehazing, t is referred to as

11

the medium transmission, specifying how much of the light from the originalsource reaches the camera without being scattered by ambient particles. J isthe color of the desired image at each pixel, referred to as the scene radiance,while A is global atmospheric light.

A prior for this problem is produced by noting the existence of a so-called darkchannel. [6] observes from a set of haze-free daytime images that natural im-ages almost always have at least one color channel (red, green, or blue) close tozero at every pixel. In contrast, the ambient light in images with haze increasesevery pixel’s color value uniformly, preventing the existence of a dark channelwhere significant haze occurs. By checking how strongly a dark channel existsat each pixel, an initial guess between 0 and 1 can be produced at every pixelin the image (in contrast to the sparse sketches used in image matting). Theimage matting problem is then solved using these constraints, and the mattingLaplacian from [8]. The only difference here is that when Lagrange’s method isapplied to incorporate the constraints, the γ used is taken to be a small posi-tive number. This is known as soft matting, and makes the initial constraintsweak (necessary, since otherwise we would start with every ti already specified!).

As an interesting side effect, once matting is complete and the medium trans-mission is known at every pixel, the haze from the image can be used to deter-mine approximately how far from the camera each pixel is. This is because theamount that atmosphere scatters light is fairly constant at any given place. Ifthe rate of scattering is denoted by a scattering coefficient β, and the depthat each pixel is denoted by di, then the depth can be found from the equation

ti = e−βdk

In practice, the results of this approximation are surprisingly good!

4.2 Deblurring

In images taken of moving scenes, objects that move a visible distance duringthe image exposure have visible blur along the direction in which they move.In [3], this blur is both removed, and analyzed to determine how objects in asingle picture were moving when the picture was taken. The key innovation of[3] is to begin analyzing motion blur by matting the image for alpha (exactlyas described in this paper) in order to determine how much a given object hasblurred into each pixel. See [3] for further details, as matting is unrelated todeblurring beyond this initial step.

4.3 Tracking

A common problem in video analysis is to track a specific object over a periodof time. The relationship between this and image matting is immediately ap-parent, in that both techniques aim to determine which part of an image is aspecific object. The key difference is that in image matting, a user must giveconstraints, whereas in tracking the goal is to automatically determine whichpart of a frame is the object given only the same knowledge for the previousframes. This issue is overcome in [4] by building model of the object being

12

tracked, kicked off by actual user constraints on the first frame. This is thenused it to produce a “user sketch” entirely programmatically at each subsequentframe, before applying the standard image matting algorithm.

The model of the object generated is divided into three major components.First, a set of salient points (points recognizable as lying inside the object)are generated. A sketch can be produced by drawing lines between these salientpoints. However, as an object moves or deforms, these points can quickly be-come unrecognizable by simple algorithms. As such they are generated as theprogram goes on, and are a purely short-term part of the model. Next, a set ofdiscriminative colors are selected. These are specific colors which occur farmore frequently in either the foreground or the background, and are generallya much better long-term indicator of where a given object lies. However, theyare also updated as frames are processed, since new parts of an object mightbecome visible. Finally, the region of the image known to be foreground is cutup into square regions, and each of these regions is stored individually in a bagof patches. At each frame, each patch is checked against the contents of thescene. If some patch is found to be unobscured, it is assumed to be part of theforeground. These patches are the most long-term model of the object, and canbe used to determine new salient points even when the tracked object is severelyoccluded (i.e. something passes in front of it) or deformed.

With these three models, each of which is sequentially updated by methodsdescribed in [4], a sufficiently accurate and dense user sketch can be generatedto apply image matting effectively at each frame. The results far outperformmore traditional video matting techniques for tracking.

5 Experiments

In order to better understand the algorithm, and to test its effectiveness, Icreated my own implementation. I used Successive Over-Relaxation as op-posed to Gauss-Seidel, since this exhibited better results (SOR is the same asGauss-Seidel, except that after solving for the error and interpolating, the resultis multiplied by a user chosen value between 0 and 2). I also used a variant onthe Matting Laplacian from [8] that takes into account the RGB channels ofthe image, rather than just the grayscale intensity at each pixel. I used theinterpolation and restriction operators described in [7], as well as the V-Cyclealgorithm for multigrid. Since speed was an important factor in testing my im-plementation, I chose to write it in C++ (a decision made also by the authorsof [7]).

In general, my results were quite impressive. My implementation does notexhibit convergence as rapidly as [7], so I believe there may still be errors in thecode, but this does not prevent most images from converging to fairly accuratemattes. One of the most prominent features of this method is just how much ofa different the multigrid optimization makes. Consider the following image andsketch:

13

The following images were both produced with 50 iterations of SOR, but thefirst did not have multigrid steps enabled, while the second did. The differenceis startling!

This difference is just as pronounced on other images. Clearly relaxation hassimilar problems on images as it did on the sample problem earlier explained!In the second image, it’s apparent that there are issues with matting aroundthe edges of the image. Although this is likely mostly an implementation issue,I did notice that many of the images shown in papers used images with in-focusforegrounds and blurry, relatively uniform backgrounds. When applied to im-ages of this type, my implementation also had quite impressive results. With30 iterations, the following sketch produced an excellent matte:

In instances such as this, the practical applications of image matting in simpleimage editing become quite apparent. Complex shapes can be cut out of im-ages with minimal effort and high accuracy. When results are not satisfactory,adding a few lines to the sketch is almost always sufficient. For example, theabove matte was used to produce the image below.

14

With a bit of effort to produce a good sketch, very good mattes can be pro-duced. However, one drawback of the strict constraints of the user sketch canbecome apparent on images with large semi-transparent regions. Consider thefollowing matting application:

15

The complex shape of the bee is selected very accurately withe the sketch given,but the wings are a problem. They are everywhere transparent, but any placesmarked as part of the foreground then become complete opaque. In the abovematte, it is clear that the correct transparency was determined for most of thewing, but the places marked in the sketch are unnaturally white.

The code for this implementation can be found athttps://github.com/dylanswiggett/image-matting/tree/master/impl.

6 Conclusion

Although image matting is not perfect in practice, and some effort is often re-quired to produce an adequate sketch, it is clearly a very powerful and widelyapplicable technique. On top of that, it’s relatively straightforward to explainmathematically, and takes much less effort to implement than many similar im-age processing algorithms (although its implementation is by no means trivial).

Although [7] is a recent paper, it is far from the cutting edge of matting re-search. None of the methods here described date later than 2011, and mostof the mathematics dates back to the mid to late 20th century. However, thetechniques described here have not lost their power in the intervening years,and a quick look around is all it takes to see how abundantly they are appliedin the forefront of image and video processing research.

16

References

[1] James H Bramble, Multigrid methods, Vol. 294, CRC Press, 1993.

[2] William L Briggs, Steve F McCormick, et al., A multigrid tutorial, Vol. 72, Siam, 2000.

[3] Shengyang Dai and Ying Wu, Motion from blur, Computer vision and pattern recognition,2008. cvpr 2008. ieee conference on, 2008, pp. 1–8.

[4] Jialue Fan, Xiaohui Shen, and Ying Wu, Closed-loop adaptation for robust tracking,Computer vision–eccv 2010, 2010, pp. 411–424.

[5] Jayadeep Gopalakrishnan and Joseph E Pasciak, Multigrid convergence for second or-der elliptic problems with smooth complex coefficients, Computer Methods in AppliedMechanics and Engineering 197 (2008), no. 49, 4411–4418.

[6] Kaiming He, Jian Sun, and Xiaoou Tang, Single image haze removal using dark channelprior, Pattern Analysis and Machine Intelligence, IEEE Transactions on 33 (2011), no. 12,2341–2353.

[7] Philip G Lee and Ying Wu, Scalable matting: A sub-linear approach, arXiv preprintarXiv:1404.3933 (2014).

[8] Anat Levin, Dani Lischinski, and Yair Weiss, A closed-form solution to natural imagematting, Pattern Analysis and Machine Intelligence, IEEE Transactions on 30 (2008),no. 2, 228–242.

[9] Christoph Pflaum, A multigrid conjugate gradient method, Applied Numerical Mathe-matics 58 (2008), no. 12, 1803–1817.

[10] Jonathan Richard Shewchuk, An introduction to the conjugate gradient method withoutthe agonizing pain, Carnegie Mellon University, Pittsburgh, PA, 1994.

17

Documents

Image Matting and Applications - University of Washingtonmorrow/336_14/papers/... · 2014. 6. 19. · 2.1 Graph Laplacian We have a non-looping, unweighted, and undirected graph