14
340 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008 Blind Separation of Superimposed Shifted Images Using Parameterized Joint Diagonalization Efrat Be’ery and Arie Yeredor, Senior Member, IEEE Abstract—We consider the blind separation of source images from linear mixtures thereof, involving different relative spatial shifts of the sources in each mixture. Such mixtures can be caused, e.g., by the presence of a semi-reflective medium (such as a window glass) across a photographed scene, due to slight movements of the medium (or of the sources) between snapshots. Classical separation approaches assume either a static mixture model or a fully convolutive mixture model, which are, respectively, either under- or over-parameterized for this problem. In this paper, we develop a specially parameterized scheme for approximate joint diagonalization of estimated spectrum matrices, aimed at estimating the succinct set of mixture parameters: the static (gain) coefficients and the shift values. The estimated parameters are, in turn, used for convenient frequency-domain separation. As we demonstrate using both synthetic mixtures and real-life pho- tographs, the advantage of the ability to incorporate spatial shifts is twofold: Not only does it enable separation when such shifts are present, but it also warrants deliberate introduction of such shifts as a simple source of added diversity whenever the static mixing coefficients form a singular matrix—thereby enabling separation in otherwise inseparable scenes. Index Terms—Approximate joint diagonalization, blind source separation (BSS), image reflections, image separation, spatial shifts. I. INTRODUCTION Q UITE commonly in photography, when scenes are recorded through window glass under poorly balanced lighting conditions, a reflection of the room interior appears (linearly) superimposed over the outside scene. Sep- arating such reflections from the scene using a single image (snapshot) is obviously an ill-posed problem. 1 However, when (at least) two such snapshots are available, each taken under slightly different conditions, it is conceivable that successful (blind) separation can be attained by proper exploitation of the diversity in the different snapshots. The diversity sources Manuscript received February 8, 2007; revised November 6, 2007. This work was supported in part by MUSCLE (Multimedia Understanding through Se- mantics, Computation, and Learning), a European-Council sponsored Network of Excellence (NoE). Parts of this work were presented at the IEEE Interna- tional Conference on Acoustic, Speech, and Signal Processing (ICASSP) 2006, Toulouse, France, May 2006. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Stanley J. Reeves. The authors are with the Department of Electrical Engineering—Systems, Tel-Aviv University, Tel-Aviv 69978 Israel (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2007.915548 1 Nevertheless, some computationally intensive approaches have been attempted, e.g., by Levin [2], using morphologically based separation criteria. can be divided into two categories according to the type of the images mixtures: static mixtures or convolutive mixtures. In static mixtures, only the (relative) intensities of the source images change from one snapshot to another. The static model can be regarded as an extension of the classic (static) 1-D blind source separation (BSS) problem for 2-D signals (1) where at each pixel , is the vector of source images, is the mixtures vector and is the mixing matrix, containing the linear mixture coefficients. Intensity attenuation may be achieved by changing the lighting conditions between snapshots (e.g., [3]). However, uniform modification of the lighting conditions over different areas of the source images may often be practically unfeasible, which might undermine the static (shift-invariable) mixing model (1). Thus, an alternative way for tampering with the source images’ relative intensities (without changing the lighting conditions) was considered in [4]–[6], where the polarization conditions were changed between snapshots. By introducing a linear polarizer in front of the camera, the relative coefficients of the mixed images may be altered. This optical setup is usually more true to the static model in (1) than a change in illumination conditions. However, even in this approach, the linear model is not fully accurate if the polarizer angle is changed by mechanical rotation (see [7]). In convolutive mixtures, convolutions of the source images with different (unknown) filters are assumed to occur prior to the mixing. In many 2-D convolutive BSS (CBSS) algorithms, the estimation of the mixing filters (as well as the subsequent separation) is executed in the frequency domain, where the con- volutive mixing transforms to different static mixtures at each frequency, forming models similar to (1). Convolutive image mixtures occur, for example, when light rays reflected from the objects go through optical devices (spatial filters), such as lenses. In this case, the optical transfer function is usually called a blur kernel. Sometimes, the convolutive image mixtures result from defocus, i.e., one (or more) of the source images is out of focus (blurred) in one (or more) of the mixtures. In this case, the change in focus conditions may be exploited as an alternative, or additional source of diversity (e.g., [8] and [9]; see also [10]). In this paper, we introduce another possible source of (con- volutive) diversity in the form of spatial-shifts. Different rela- tive spatial shifts of the sources between snapshots are the in- evitable outcome of possible movement of the reflective medium 1057-7149/$25.00 © 2008 IEEE

340 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, …arie/Files/tip08.pdfBlind Separation of Superimposed Shifted Images Using Parameterized Joint Diagonalization Efrat Be’ery

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • 340 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    Blind Separation of Superimposed Shifted ImagesUsing Parameterized Joint Diagonalization

    Efrat Be’ery and Arie Yeredor, Senior Member, IEEE

    Abstract—We consider the blind separation of source imagesfrom linear mixtures thereof, involving different relative spatialshifts of the sources in each mixture. Such mixtures can be caused,e.g., by the presence of a semi-reflective medium (such as a windowglass) across a photographed scene, due to slight movements ofthe medium (or of the sources) between snapshots. Classicalseparation approaches assume either a static mixture model ora fully convolutive mixture model, which are, respectively, eitherunder- or over-parameterized for this problem. In this paper,we develop a specially parameterized scheme for approximatejoint diagonalization of estimated spectrum matrices, aimed atestimating the succinct set of mixture parameters: the static (gain)coefficients and the shift values. The estimated parameters are,in turn, used for convenient frequency-domain separation. Aswe demonstrate using both synthetic mixtures and real-life pho-tographs, the advantage of the ability to incorporate spatial shiftsis twofold: Not only does it enable separation when such shifts arepresent, but it also warrants deliberate introduction of such shiftsas a simple source of added diversity whenever the static mixingcoefficients form a singular matrix—thereby enabling separationin otherwise inseparable scenes.

    Index Terms—Approximate joint diagonalization, blind sourceseparation (BSS), image reflections, image separation, spatialshifts.

    I. INTRODUCTION

    QUITE commonly in photography, when scenes arerecorded through window glass under poorly balancedlighting conditions, a reflection of the room interior

    appears (linearly) superimposed over the outside scene. Sep-arating such reflections from the scene using a single image(snapshot) is obviously an ill-posed problem.1 However, when(at least) two such snapshots are available, each taken underslightly different conditions, it is conceivable that successful(blind) separation can be attained by proper exploitation ofthe diversity in the different snapshots. The diversity sources

    Manuscript received February 8, 2007; revised November 6, 2007. This workwas supported in part by MUSCLE (Multimedia Understanding through Se-mantics, Computation, and Learning), a European-Council sponsored Networkof Excellence (NoE). Parts of this work were presented at the IEEE Interna-tional Conference on Acoustic, Speech, and Signal Processing (ICASSP) 2006,Toulouse, France, May 2006. The associate editor coordinating the review ofthis manuscript and approving it for publication was Prof. Stanley J. Reeves.

    The authors are with the Department of Electrical Engineering—Systems,Tel-Aviv University, Tel-Aviv 69978 Israel (e-mail: [email protected];[email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TIP.2007.915548

    1Nevertheless, some computationally intensive approaches have beenattempted, e.g., by Levin [2], using morphologically based separation criteria.

    can be divided into two categories according to the type of theimages mixtures: static mixtures or convolutive mixtures.

    In static mixtures, only the (relative) intensities of the sourceimages change from one snapshot to another. The static modelcan be regarded as an extension of the classic (static) 1-D blindsource separation (BSS) problem for 2-D signals

    (1)

    where at each pixel ,is the vector of

    source images,is the mixtures vector and is the mixing matrix,containing the linear mixture coefficients. Intensity attenuationmay be achieved by changing the lighting conditions betweensnapshots (e.g., [3]). However, uniform modification of thelighting conditions over different areas of the source imagesmay often be practically unfeasible, which might underminethe static (shift-invariable) mixing model (1). Thus, analternative way for tampering with the source images’ relativeintensities (without changing the lighting conditions) wasconsidered in [4]–[6], where the polarization conditions werechanged between snapshots. By introducing a linear polarizerin front of the camera, the relative coefficients of the mixedimages may be altered. This optical setup is usually moretrue to the static model in (1) than a change in illuminationconditions. However, even in this approach, the linear modelis not fully accurate if the polarizer angle is changed bymechanical rotation (see [7]).

    In convolutive mixtures, convolutions of the source imageswith different (unknown) filters are assumed to occur prior tothe mixing. In many 2-D convolutive BSS (CBSS) algorithms,the estimation of the mixing filters (as well as the subsequentseparation) is executed in the frequency domain, where the con-volutive mixing transforms to different static mixtures at eachfrequency, forming models similar to (1). Convolutive imagemixtures occur, for example, when light rays reflected fromthe objects go through optical devices (spatial filters), such aslenses. In this case, the optical transfer function is usually calleda blur kernel. Sometimes, the convolutive image mixtures resultfrom defocus, i.e., one (or more) of the source images is out offocus (blurred) in one (or more) of the mixtures. In this case, thechange in focus conditions may be exploited as an alternative,or additional source of diversity (e.g., [8] and [9]; see also [10]).

    In this paper, we introduce another possible source of (con-volutive) diversity in the form of spatial-shifts. Different rela-tive spatial shifts of the sources between snapshots are the in-evitable outcome of possible movement of the reflective medium

    1057-7149/$25.00 © 2008 IEEE

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 341

    or of the recorded objects from one snapshot to another. More-over, such relative shifts can be introduced deliberately (e.g., byslightly slanting the window between snapshots), so as to imposethe desired diversity. The accommodation of spatial shifts allowsthe exploitation of this further diversity between snapshots, and,thus, enables us to attain improved separation, especially whenthe static mixing coefficients alone give rise to ill-conditionedmixing. Although considered convolutive in general, the mixingin our model is far more succinctly parameterized than a gen-eral convolutive model, since only a small number of unknownshift-parameters is added to the unknown static mixing coeffi-cients. Yet, our simple model of relative spatial shifts often pro-vides an accurate description of the mixtures.

    Naturally, for the model to be exact, the two scenes must re-main unchanged between snapshots (besides the possible relativeshifts and uniform intensity changes). Usually, this would implythat the scenes should either contain only still, rigid and nonde-formable objects with no movements in the direction towards/fromthecamera,or,alternatively, that the timebetweensnapshotsshould be sufficiently short, such that any internal displacementis negligible with respect to the transversal spatial shift of the en-tire scene. Note, however, that these basic requirements are alsoshared by other (aforementioned) separation scenarios. If no in-tensity changes are present between snapshots (we would refer tothis condition as a “singular mixing matrix” in the sequel), then tostill enable separation, the relative shifts of the sources betweenthe snapshots must be different. For example, in the common caseof two-snapshots-two-sources, at least one source must appearshifted betweensnapshots, and the other source, if shifted as well,must be shifted in a different direction.

    To the best of our knowledge, the problem of blind sepa-ration of image mixtures consisting of pure unknown relativespatial-shifts in addition to unknown scalar mixing coefficientshas not been considered explicitly (to date) in open literature.2

    Separation of dynamic images from a static background overseveral frames (an underdetermined problem) has been consid-ered in [11]. Separation of two layers from multiple snapshots,taken from various known camera positions and involvingtransparencies and reflections was considered by Tsin et al.[12] and also by Szeliski et al. [13]. The problem of separating1-D time-domain mixtures with different relative time-delayshas been considered in [14], where a separation approach basedon specially parameterized approximate joint diagonalization(AJD) was proposed. The AJD approach exploited the extended“alternating-columns—diagonal-centers” (AC-DC) algorithm[15]. In this paper, we address a model similar to the 1-Dmodel considered in [14], but extend the approach of [14] toaccommodate 2-D image signals, with the 1-D time-delayssubstituted by 2-D spatial-shifts.

    This paper is structured as follows. Following the problemformulation in Section II, we outline an AJD-based solution forestimating the mixtures’ parameters in Section III. We describethe frequency-domain separation procedure (estimation of thesource images) in Section IV. The overall computational com-plexity is evaluated and discussed in Section V. The suggested

    2With the exception of our conference paper [1], which presented some inter-mediate results of this work.

    2-D AC-DC algorithm was applied to synthetic mixtures, as wellas to real-life photographs of reflective scenes. The results ofthe separation are presented in Section VI, where we also dis-cuss the applicability of our method for separating overlappingvideo sequences. Concluding remarks are summarized in Sec-tion VII.

    The following notation is used. Boldface uppercase letters de-note matrices, boldface lowercase letters denote column vec-tors, and standard lowercase letters denote scalars. The super-scripts , , , and denote the transpose, conju-gate transpose, conjugate and inverse, respectively; de-notes the concatenation of a matrix’ columns into one vector,

    denotes the vector of diagonal values when operatingon a matrix, or a diagonal matrix with the specified diagonalvalues when operating on a vector; denotes the Frobeniusnorm, denotes the trace; and denote Kronecker’sproduct and Hadamard’s (element-wise) product, respectively.Parentheses/brackets are used to enclose continuous-/discrete-space indices, respectively. The symbol (not ) is used to de-note , so as to avoid conflict with the simple index .

    II. PROBLEM FORMULATION

    We consider the following model of continuous 2-D sourcesand sensors (to be later transformed into a discrete model):

    (2)

    where are the 2-D sources, are the observa-tions (mixtures), are the mixing coefficients and , arethe shifts in , directions of source image in mixture withrespect to its position in mixture 1. Without loss of generality,we use as a “working assumption” zero shifts of the source im-ages in the first mixture, i.e., for .Note that this is equivalent to determining (without loss of gen-erality) that the origin of the scene’s coordinate system coin-cides with the position of each source’s origin in the first mix-ture. In this continuous model, , , , and are all mea-sured in some physical length units, e.g., microns. The shifts areassumed to have occurred prior to the sampling (digital imaging)process and are, therefore, not necessarily an integer multipleof the pixels’ physical dimensions. We generally assume that

    , but we shall mainly concentrate on the case .The (presampled, continuous-space) source signals (im-

    ages) are assumed to be zero-mean, mutually uncorrelated,wide-sense stationary (WSS) processes with unknown spectra.Yet, it has to be stressed that the stationarity assumption is onlyused for simplifying the derivations, and is not instrumentalfor the resulting performance. The more important conditionfor successful separation in this context is that the sources bemutually uncorrelated. Luckily, this property is widely satis-fied by images of independent sources, unlike the stationarityassumption, which is rarely satisfied in natural images.

    The available data are samples of the continuous-space ob-servations, , ,

    , where , are the sampling intervals, typically the

  • 342 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    pixel-lengths, in the , directions (respectively), assumed tocomply with the Nyquist rate.

    The goal is to blindly recover the (sampled) sourcesfrom the observations without

    any prior knowledge regarding the mixture parameters or thesources’ spectra. The main thrust in this paper is directed atestimation of the unknown mixing parameters, namely the rel-ative shifts values, as well as the static mixing gains. Accurateestimation of these parameters would, in turn, enable us toseparate the sources by “undoing” the mixing operation.

    III. ESTIMATION OF THE MIXING PARAMETERS

    A. Formulation as a Joint Diagonalization Problem

    The observations’ correlation functions are given by (3),shown at the bottom of the page, where denotesthe correlation between the th and the th mixed images atlags , and denotes the autocorrelation of the thsource image, such that the last equality is due to the statisticaldecorrelation between sources. Fourier-transforming (3), weobtain , the cross-spectrum between the th and thmixtures at (angular) frequencies (measured in units of

    )

    (4)

    where is the th source’s (unknown) spectrum. Now,assuming that the imaging (sampling) process does not intro-duce any aliasing, we obtain that the relation in (4) also holdsfor the spectra of the sampled images , with propersubstitution of the “physical” angular frequencies withunit-less angular frequencies and , and ofthe “physical” shifts with their unit-less counter-

    parts , (as mentioned earlier,these are not assumed to be integers).

    In addition to this substitution, (4) can also be expressed inmatrix-form as

    (5)

    where is a matrix consisting of asthe th element, is the sources’ joint spectra ma-trix, which is an diagonal matrix (since the sources are

    uncorrelated) consisting of as its th elements, andis the matrix given by

    (6)

    Here, is the constant matrix of mixing coefficients, whoseth element is , and the matrices ,

    contain the exponential terms depending (respec-tively) on the shifts-matrices , , which, in turn, con-tain all shifts , . More specifically, the th elementsof , are given by

    (7)

    The cross-spectral matrices are unknown, but canbe estimated from the available data, possibly by using the 2-Ddiscrete-space Fourier transform (DSFT) of a truncated series ofcross-correlations estimates (Blackman–Tukey’s method, e.g.,[16]). Specifically, to estimate the th element of ,we may use

    (8)

    where , are (resp.) the truncation-window length in the -,- directions, and

    (9)

    (under the convention that whenever theindices extend beyond the support) are the(slightly biased) correlation estimates.

    Note that if the sources are not stationary, the estimated“spectra” are nearly meaningless. Nevertheless, ifthe zero correlation between sources is persistent, then theexpression in (5) still holds (asymptotically), with the spectrasubstituted by spatial averages (over the entire images) of thelocation-dependent “local spectra.” Nevertheless, the diago-nality of is maintained, due to the persistent absenceof correlation between sources.

    When estimated values, rather than true values ofare used, (5) usually can no longer be satis-

    fied simultaneously at all frequencies. Nevertheless, onceis estimated at several frequencies-pairs (vectors)

    (3)

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 343

    , an estimate of theunknown parameters of interest can be obtained by resortingto AJD, e.g., seeking to minimize the following least-squares(LS) criterion:

    (10)

    where is an tensor (three-way array) con-taining the respective sources’ spectra

    . Note that it is also possible to usea weighted LS criterion by introducing some positive weights

    into the sum; however, to simplify the exposition, we shallnot pursue this possibility in here. The rest of this Section is con-cerned with efficient minimization of , leading to estimatesof the mixing parameters (shifts and gains).

    It is important to note an essential difference from thecommon frequency-domain AJD approach, which is oftenapplied in the context of general convolutive mixtures. In thegeneral approach, several spectral matrices (“target matrices”)are estimated at each frequency (e.g., from different segmentsof the images), and then AJD is applied separately to therespective set at each frequency. Such an approach is prone tosuffer frequency-dependent permutation and scaling ambigui-ties inflicted by the solutions of each AJD at each frequency.Unless properly resolved, frequency-dependent ambiguitiesobviously give rise to severe distortions. However, in ourproposed framework, a single AJD is applied to all frequenciestogether, with one “target matrix” at each frequency, relyingon a specified frequency dependence of the diagonalizingmatrices. This avoids the possibility of frequency-dependentambiguities, since the same ambiguity would apply to allfrequencies, resulting in an overall (single) permutation andscale ambiguity—which is quite acceptable.

    Several AJD algorithms have been proposed in recent year;however, they all assume a constant diagonalization matrix ,rather than which depends on the indices . In [14]an extension of one particular AJD algorithm (AC-DC, [15])was proposed to address the 1-D problem. In Section III-B, wepropose further extension of the extended AC-DC, adapted tothis 2-D minimization problem. For simplicity of the exposi-tion we shall assume, from now on, that the number of mixturesequals the number of sources, namely (further simplifi-cation, to the case , will follow thereafter).

    B. AJD via Extended 2-D AC-DC

    AC-DC [15] is an alternating-directions algorithm for mini-mizing a LS joint-diagonalization criterion such as of (10).It is originally intended for finding a constant diagonalizing ma-trix . In our case, the matrix is not constant; Nevertheless, itcan be expressed [recall (6)] in terms of the three constant ma-trices, , and containing the mixing parameters. It isthen possible to minimize with respect to (w.r.t.) each columnof and each pair of matching columns (columns with the same

    index) of and separately, thus alternating between min-imizations w.r.t.:

    • (in the “DC” phase);• each column of (in the “AC-1” phase);• each pair of matching columns of and (in the

    “AC-2” phase).We shall now describe each phase in detail. To simplify nota-

    tions, we shall omit the “hat” from above the estimated spectrafrom now on.

    1) “DC” Phase: In the DC phase, we wish to minimizew.r.t. , with , and fixed. Since the th columnof is the diagonal of , it participates only in the

    th term of the sum in (10). Thus, the overall minimizationcan be decomposed into distinct minimizationproblems, which are all linear in the unknown parameters. Assuch, each of these problems admits the well-known linear LSsolution. Specifically, note that each th term in the sum canbe expressed as

    (11)

    where is the th column of , ,and

    (12)

    where denotes an vector of 1-s. Note that this expressionis sometimes also referred to as the Khatri–Rao product ofand . The well-known minimizer of each linear LS term in(11) is

    (13)

    Note that for simplifying the computational load, one may usethe relations

    (14a)

    (14b)

    with which the computation of and of requiresonly multiplications (per frequency-point), for a total of

    .2) “AC-1” Phase: We now wish to minimize w.r.t., the th column of , assuming the other

    columns, as well as , and , are fixed. To this end, let usdefine

    (15)where is the th column of . Note that since

    are the implied estimates of the sources’ spectra, theyare all real-valued [this can also be verified by observing thatthe expressions in (14a) and (14b) are real-valued, so allin (13), which contain all , are real-valued, as well].

  • 344 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    We may, thus, express the LS criterion in (10) as (16), shownat the bottom of the page, where is an independent con-stant. Observe now (from 6), that can be written as

    where

    (17)

    Consequently, can be further simplified

    (18)

    To simplify this expression even further, let us decomposeinto a scale times a unit-norm vector , namely ,such that . Note that since the mixing coefficients areall real-valued, we require that both and be real-valued. Wemay now reduce (18) into

    (19)

    where is the Hermitian matrix

    (20)

    and is a positive constant

    (21)

    Differentiating (19) w.r.t. and equating to zero yields threepotential solutions for the minimizing : either or

    (22)

    Since is Hermitian, the argument of the square-root in (22)is real-valued for all real-valued ; however, it is not guaran-teed to be positive. Thus, a real-valued square root does notalways exist. Indeed, if is negative-definite, the argumentof the square-root would always be negative, so the only pos-sible (real-valued) solution is , and minimization ofw.r.t. is attained in this case by . Normally, however,

    is not negative-definite, and, therefore, the argument of thesquare-root can be made positive, at least with some . Then, itis easy to observe that any real-valued nonzero satisfying (22)is preferable to , since it attains a smaller value forin (19).

    Indeed, let us assume that a real-valued square-root ex-ists in (22). Substituting back into (19), we observe that theminimization problem reduces into maximization w.r.t. of

    Real , subject to . The de-sired solution is well-known to be obtained as the eigenvectorof Real associated with the largest (positive) eigenvalue.Combining this solution with (19), we obtain

    (23)

    where and are (resp.) the largest eigenvalue of Realand the associated eigenvector.

    For evaluating the computational load, we assume one “fullsweep” (namely, minimization with respect to each of thecolumns) per iteration. At each frequency-point, the main com-putational burden is that of calculating [in (15)], whichis . Apparently, it can be argued that since this is requiredfor each column, the total load (per frequency-point) is ,but noting that the elements in(15) may be calculated just once (per iteration, per frequency-point) and then used for all of the columns, the load is in-deed , for a total of multiplications per itera-tion. Note that the eigenvalue decomposition of Rreal canbe considered negligible here, since it requires multipli-cations and is done only once for the entire frequency range (percolumn).

    3) “AC-2” Phase: It is now desired to minimize w.r.t.the th columns of and denoted and

    (16)

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 345

    , respectively, assuming the other columns, as well as and, are fixed. Since the dependence of in (19) on the shifts, appears only through , we can rewrite (19) as

    (24)

    where is another constant. Therefore, minimization of

    requires maximization of w.r.t. , which

    are all the shifts in the th columns, excluding , whichwere arbitrarily set to zero. More explicitly, we seek to maxi-mize

    (25)

    Here, denotes the th element of the matrix

    (26)

    For evaluating the computational load, let us assume that themaximization involves searching a -dimensional gridof shift-values, where each of the unknown and (for

    except ) takes and (resp.) poten-tial values on a scalar grid. , are determined by the user,considering the range of possible shifts in each direction andthe desired search resolution. For a sweep over the columnsof and , this would require multi-plications per iteration (including the negligible computation ofall , which requires ), which may be ratherlarge.

    However, if we focus on the relatively simple (but quitecommon) case , we notice significant simplifications.First, we note that in this case only two elements (out of four) inthe outer summation in (25) depend on the unknown shifts—theelements corresponding to . Thus, for , we seek tomaximize

    (27)

    w.r.t. and . Observe now, that due to the conjugate-sym-metric structure of the two terms in this sum forma conjugate pair, and, therefore, the sum is given by twice thereal-part of these terms. Thus, depending on the sign of ,we either need to minimize or to maximize this real-part. A rea-sonable assumption for mixtures of images is that all the staticmixture coefficients are non-negative. Consequently, the desiredshifts are given by the maximization

    (28)

    where

    (29)

    As further simplification, we note that if the frequenciesand are chosen as

    (30)

    (with and some selected constants), then is the 2-Dinverse Fourier series (2-D IFS) of the sequence .Again, the search in each direction may be conducted overscalar grids of sizes , giving rise to a 2-D search overa -sized grid.

    For (looking for , ), we wish to maximize realwith .

    This term can be similarly minimized by searching the 2-D IFSof .

    Thus, the overall computational load per iteration in the AC-2phase for the case is . This appears equiva-lent to the expression obtained above for general (substituting

    ), yet due to the simplifications noted above, the implicitconstant in the term is about four times smaller. Note fur-ther, that if and , a computationally efficient2-D fast Fourier transform (2-D FFT) can be used to computethe required 2-D DFS, thereby further reducing3 the loadto .

    C. Initial Guess

    For such alternating-directions algorithms, an “intelligent”initial guess is required, so as to avoid convergence to spuriousminima. Luckily, a reasonable guess for the relative-shifts couldbe readily obtained from the data. From (3), we have

    (31)

    A basic property of an auto-correlation function, , is

    , , . Hence,

    achieves its maximal value at . Underreasonable conditions on the spatial-shifts, on the mixingcoefficients and on the sources’ auto-correlation functionsshape, the peaks related to the different sources are roughlydistinguishable (see Fig. 1) to within accuracy which, althoughinsufficient for separation, may be sufficient as an initial guessfor the AJD. Consequently, rough estimates of the (unit-less)shifts pairs , (for ) can be extracted from thediscrete estimate of . Note, however, that

    3Assuming log (IJ) < K K .

  • 346 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    Fig. 1. Mesh of the cross-correlation function between images, used forextracting rough locations of the peaks as initial guesses for the shift-values.Left: Without edge-enhancement preprocessing; Right: With preprocessing.Two peaks are clearly observed in the preprocessed version. Rough estimatesobtained from the raw version are less accurate, but are still adequate as initialguesses for AC-DC.

    since the peaks are not labeled, their consistent association withthe sources may be ambiguous for .

    For the mixing coefficients, we currently do not have a simpleautomatic procedure for obtaining an “intelligent” initial guessin the general case. However, a priori information on the changein lighting or polarization conditions between the snapshots, ifavailable, could be exploited to obtain an initial guess. For ex-ample, if we know that the lighting or polarization conditionswere not deliberately changed between snapshots, we can as-

    sume a singular as an initial guess.

    D. Preprocessing Stage

    Since the important condition of uncorrelated sources isgenerally better satisfied between edge-enhanced images thanby their original counterparts, we attained considerably betterestimation (hence, separation) when a linear shift-invariant(LSI) (zero-phase) edge-sharpening filter was applied to bothmixtures as a preprocessing stage.4 Note that such LSI filteringis merely equivalent to the introduction of similar spectralshaping to both sources. The result simply implies some specialweighting of the LS criterion (10) along the frequency-plane.Evidently, more weight is attributed in the AJD process to fre-quencies which correspond to dominant regions in the spectralshape of the filter.

    In addition, the edge-enhancement preprocessing can behelpful for generating the initial guess using the cross-correla-tion between the edge-enhanced mixtures. The peaks exhibitedby this cross-correlation function are more easily distinguish-able than those exhibited by the cross-correlation of the rawimages. To illustrate, we demonstrate typical cross-correlationfunctions with and without the preprocessing stage in Fig. 1.These plots were obtained with images taken from the firstexperiment, as described later on in Section VI-B.

    We note in passing that our edge-enhancement preprocessingprocedure might also be considered as a sparsifying transforma-tion; see, e.g., [5] on the use of sparsity for separation.

    4The problem of finding “good” edge-sharpening filters is out of scope ofthis work. Hence, we did not put much effort in optimizing the edge-sharpeningfilter, and settled for a filter that was “good enough” for our purposes.

    IV. SEPARATION OF THE SOURCES

    Once all of the mixing parameters (the spatial-shifts, as wellas the static mixing coefficients) have been estimated, theycan be used to “reverse” the mixing operation and recoverthe source images. Generally, there are at least two possibleapproaches for separation: spatial-domain separation and fre-quency-domain separation.

    A. Spatial-Domain Separation

    Since the mixing model (2) was conveniently expressed in thespatial-domain, it might appear intuitively appealing to “undo”the mixing in the spatial-domain, as well. However, spatial-do-main inversion of (2) would generally take a form which isconsiderably more complex than (2) itself. A recovered sourcecannot, in general, be expressed as a simple combination of the

    mixture-frames, in which each mixture-frame is (inversely)shifted.

    Consider the simplest case of sources/mixtures first.By “back-shifting” the second mixture by the estimated shiftsof, say, the first source (namely, by ), the posi-tion of the first source in the second mixture would be alignedwith its position in the first mixture. Then, if the second, back-shifted mixture-frame is multiplied by and subtractedfrom the first mixture-frame, the first source is thereby elim-inated from the result. However, this result now contains twosuperimposed, differently shifted versions of the second source,namely a filtered version thereof. In order to recover the secondsource, this filtering operation has to be inverted—which re-quires to apply an infinite impulse response (IIR) filter in thespatial domain. A similar operation can be used to recover thefirst source.

    When the number of sources/mixtures is larger than 2, spatial-domain separation becomes even more complicated, since thesources have to first be eliminated one-by-one, creating “new”mixtures along the way. After such a “deflation” approach isapplied, the remaining source would be recovered by applyingthe respective IIR filtering to the result, so as to invert the linearcombinations of multiple-shifts thereof, incurred along the way.

    An alternative approach for spatial-domain separation istaken by Szeliski et al. [13] in the case where the numberof observations is much larger than the number of sources.The method relies on the statistical variation of the sourcesin the spatial-domain, and on the fact that pixel intensityvalues are non-negative. In their “Min-Composite” approach,Szeliski et al. propose to first “undo” the shifts of all mixturessuch that one of the sources is aligned, and then take, in eachpixel, the minimum intensity value (among all shifted mixtures)as an estimate of the desired source. Subsequent iterations withmaximum-values are also considered.

    Tsin et al. [12] use the “Min-Composite” approach as an ini-tial estimate for an iterative procedure: Assume that a good esti-mate of the first source is available. It can then be aligned to fit itsposition in each mixture and eliminated (subtracted with propergain), leaving estimates of only the second source in each mix-ture. These estimates are then shifted so as to align the replicaof the second source, and averaged. The averaging effect refinesthe estimate of the second source, and the iterative algorithm

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 347

    may proceed for successive refinement, switching roles betweenthe two sources.

    All of these spatial-domain approaches require the applica-tion of noninteger spatial shifts along the way. However, this canbe usually attained (assuming sufficient sampling-rate) usingrelatively simple interpolation filters (even a two-taps linear in-terpolator may be sufficient). The drawback of these approachesin the context of our problem formulation (with an equal numberof sources and mixtures) lies mainly with the subsequent pro-cessing. The rigorous approach mentioned earlier requires theapplication of IIR filtering. The Min-Composite approach, aswell as the iterative approach by Tsin et al. rely on the avail-ability of many more mixtures than sources for statistical sta-bility. All of these approaches become considerably more in-volved when there are more than two sources.

    On the other hand, when transformed to the frequency-do-main, the mixing model becomes purely multiplicative in eachfrequency. The observation vector at each frequency is givenby a matrix (which depends on the mixing parameters and onthe frequency) times the source vector at that frequency. There-fore, straightforward, closed-form (noniterative) reconstructionof the sources can be attained simply by inverting this operationat each frequency.

    With the exception of marginal end-effects, this fre-quency-domain approach can be easily shown to be equivalentto the (rigorous) spatial-domain approach. In fact, if null-mar-gins, wider than the maximum shift-value, are added to themixture-frames before transforming5 to the frequency-do-main, exact equivalence can be attained, and yet, we find thefrequency-domain approach both computationally and concep-tually more simple than its spatial-domain counterpart.

    B. Frequency-Domain Separation

    Using the (2-D) discrete Fourier transform (2-D DFT) of theobservations, we obtain 2-D series of size

    (32)

    Denoting and , the estimated mixing coefficientsand (unit-less) spatial shifts, we construct matrices ,whose th elements are given by

    (33)

    (34)

    where .

    5Note that such margins should not be added before the parameters estimationstage, since they may introduce a bias in the estimation, towards a no-shiftscondition.

    The source images can now be separated (estimated) in thefrequency domain using

    (35)

    where , and isthe estimate of matrix [see (6)] at the frequencies

    , (assuming the inverse exists).Transforming back to the space-domain, we obtain

    (36)

    where is the th element of . The reconstructed im-ages may possibly be subject to arbitrary (but frequency-inde-pendent!) permutations, scaling and shift, due to the inherentambiguities of the blind separation problem.

    This frequency-domain separation scheme implies exact in-version of the mixing (assuming exact estimates of the param-eters) only for cyclic shifts (namely, for shifts in which anycontent shifted beyond the margins “returns” into the imagefrom within the opposite margins). In reality, the shifts are ob-viously not cyclic, so even if the true parameters were used,our unmixing scheme would introduce some marginal end-ef-fects. Nevertheless, these end-effects are rather tolerable whenthe frequency-domain mixing matrix is well-conditioned at allfrequencies.

    However, in the case of a singular (or nearly singular) staticmixing matrix , the resulting frequency-domain mixingmay be singular (or nearly singular) at several frequency-pairs6

    (vectors). For example, for the case (later used in the experi-

    ments section) of the singular mixing matrix

    with shifts and ,

    the “singularity lines” in the frequency domain are illustrated inFig. 2 (left): For the entire 374 374 frequency-grid used in theexperiment, we marked all grid-points where the con-dition-number of the matrix is higher than 100, makingit nearly singular.7 The corresponding “singularity stripes” areevident.

    As a result, the inverse of the (nearly) singular mayunduly enhance the mismatch between the cyclic shift and thephysical shift models along these lines, resulting in a corre-sponding, severe “stripes effect” on the reconstructed image. Wedemonstrate this effect on one of the reconstructed images (inthe same mixing example) in Fig. 2 (right).

    It is important to note here that when the mixing-matrix is(nearly) singular, the spatial-domain reconstruction approach

    6In fact, for L > 2, the matrix BBB[r; t] may be singular at some frequencieseven if the mixing matrix AAA is regular.

    7Exact singularity (infinite condition number) occurs along straight lines inthe continuous frequency-domain; However, points along the discrete grid mightbe “nearly” but not “exactly” singular, depending on their distance from theselines.

  • 348 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    Fig. 2. Left: Example of “singularity stripes” in the frequency-domain, occur-ring when the static mixing matrix A is singular (see Section VI-B for more de-tails): grid-points where the condition-number of the frequency-domain mixingB[r; t] is larger than 100 are marked. Right: The resulting “stripes effect” in thereconstructed image (when MMSE reconstruction is not used).

    discussed earlier cannot offer a remedy to the problem. The sin-gularity will be evident when trying to construct the IIR filter(see the previous subsection), which would not be stable in thiscase. This filter would have infinite gain along the same fre-quency-stripes, and would amplify the differences (present atthe edges) between the true mixture and the presumed mixture.Even if null-margins are added to the mixture images before sep-aration is applied, so as to avoid the cyclic-shifts, the resultingmixture would still be different, since the null margins cannotaccount for missing data of the true shifted sources, lost beyondthe imaging frame. Moreover, we shall now show, that in ourfrequency-domain approach we would be able to offer a fre-quency-selective remedy, which cannot be applied in a purelyspatial-domain approach.

    In order to mitigate the “stripes effect,” we need to apply amore careful inversion (reconstruction) scheme. We propose toregard the cyclic mismatch as additive noise with some pre-scribed small variance (proportional to the extent of the shift),and apply a minimum mean square error (MMSE) inversion ap-proach. In so doing, we shall employ several simplifying as-sumptions, in order to obtain a relatively simple result. Thismeans that we might not end up with the true MMSE estimate.Nevertheless, MMSE estimation is not our prime goal here; weare merely seeking to mitigate the “stripes effect,” by regular-izing the inversion at the relatively small number of problematicfrequencies, with minimal effect on the separation at all otherfrequencies. This would be done automatically, without needfor prior distinction between “problematic” and “benign” fre-quencies.

    The ”noisy” mixture at each frequency-bin is modeledas

    (37)

    where are the respective “noise” components. Assuming,for simplicity, that these noise terms are uncorrelated betweendifferent frequency-bins, we may apply a separate MMSE esti-mate at each frequency, decoupled of all other frequencies. TheMMSE estimator of from is well known to be givenby

    (38)

    where

    (39)

    is the covariance matrix between the sources-term and themixtures-term , and

    (40)

    is the (auto-)covariance of the mixture-term . Here,and denote the auto-covariances of the

    sources-term and of the noises-term, respectively. Since thesources are uncorrelated, is diagonal. We also assumethat is diagonal. For further simplicity we assume that

    and do not depend on the frequency .With all these simplifying assumptions, the MMSE estimatetakes the form

    (41)

    Our last simplifying assumption is that has a constant diag-onal. We also make a similar assumption regarding , withthe exception that the element of this matrix is null: This isdue to the “working assumption” that the sources are unshiftedin the first mixture, so that the cyclic-shift mismatch noise for

    has zero variance. Finally, using the estimatedinstead of the true , our reconstruction is formed as a reg-ularized version of the direct inversion used in (33)

    (42)

    where denotes the identity matrix with the (1,1) element setto zero, and is a small constant value, reflecting the presumedratio between the energy of the “noise” (mismatch) and the en-ergy of the source images. Obviously, when approaches zero,(40) reduces to (33) if is invertible.

    As in the nonsingular case, we then apply the inverse trans-form in order to return to the space domain

    (43)

    This approach reduces the “stripes effect” caused by the (nearly)singular frequency points (if any), yet has nearly no effect atnonsingular points, as long as is small enough. See the im-proved experimental results (for the conditions of Fig. 2) in Sec-tion V.

    For separation of color images, the same separation schememay be applied to each color layer separately. The mixing pa-rameters, which are assumed common to all color layers, areestimated just once, from the intensity signals only. Then, theseparation process in each color layer uses the same estimatedmixing parameters.

    V. COMPUTATIONAL LOAD

    Let us now summarize the overall computational load of theproposed separation scheme. We remind, for convenience, that

    denotes the number of sources/mixtures; , denote the

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 349

    image size; , are the dimensions of the spectral frequency-grid; , are the numbers of points along the horizontal andvertical grids (resp.) used for grid-based maximization in theAC-2 phase; and , are the dimensions of truncated, esti-mated correlation sequence. In addition, we denote by ,the dimensions of the preprocessing edge-enhancement filter.We express the load in terms of orders of multiplications-countsas a function of these parameters. Note that this count excludes,e.g., additions and search operations, but they are roughly of ei-ther comparable or negligible complexity in each stage, so theyhardly affect the order of total operations counts.

    • The preprocessing filtering requiresmultiplications (for a separable preprocessing filter thiscan be reduced to ).

    • The estimation of the correlation matrices requiresmultiplications.

    • Transformation into the frequency domain for obtainingthe spectral matrices requires multiplica-tions; If a uniform frequency-grid is used, then this can bedone using 2-D FFT in , which may besignificantly lower.

    • The DC phase requires multiplications per iter-ation (recall Section III-B-I for details).

    • The AC-1 phase also requires multiplications periteration (recall Section III-B-II for details).

    • The AC-2 phase requires multipli-cations per iteration (recall Section III-B-III for details).

    • The separation in the frequency-domain requires twotransformations of all images to/from the frequency-do-main ( ), in addition to the cal-culation of separation matrices and separation re-sults at each frequency ( ) for a total of

    multiplications.The total multiplications count naturally depends on the

    number of iterations required for convergence of the 2-DAC-DC algorithm (typically 15-20 in our experiments, seebelow). Denoting that number by , the total count is

    (44)

    Things simplify when we take the common case of . If weuse a uniform frequency-grid and further assume that isof the same order as (or smaller than) , and that

    , (thus, using a 2-D FFT in AC-2), the overall countreduces (and simplifies) into

    .Note that the iterative 2-D AC-DC estimation phase is not a

    major part in the overall computational load (assuming a reason-ably small number of iterations). The estimation of the spectralmatrices and the frequency-domain separation may be consid-erably heavier. Thus, other frequency-domain based separationschemes (e.g., [9]) would essentially be of comparable compu-tational load.

    Considerably cheaper methods for image separation certainlyexist (e.g., [3], [6], and [17]). However, these methods do notaccount for relative spatial shifts and are, therefore, inapplicablein the context of our problem.

    VI. SEPARATION RESULTS

    In this section, we present image reflections separation resultsof the 2-D AC-DC algorithm. First, we present some results ob-tained with synthetic mixtures. Then, we demonstrate resultsobtained with real-life photographs containing reflections. Fi-nally, we present another application for our 2-D AC-DC algo-rithm, in the form of separation of dissolved video sequences.

    Although we used color images, all images in this section arepresented (in print) in grayscale colors. Color versions of allimages in this section can be found online in [18].

    A. Creating Synthetic Mixtures

    In order to be able to apply fractional shifts, the image mix-tures were constructed in the frequency domain. First, the 2-DDFT of the source images was calculated

    (45)

    Next, given the desired mixture parameters in the form of thematrices , , , we constructed two shifts-matrices

    , whose , th elements are given by

    (46)

    .(47)

    Then, the vector , a cyclic-shifted version of the mixtures,was created (in the frequency domain)

    (48)

    where is the sources’ 2-DDFT vector. Next, we applied the inverse 2-D DFT to (44)

    (49)

    Finally, the mixtures margins were removed to obtain linearlyshifted versions of the mixtures

    (50)

    where the margins width , , satisfy the condition, . The frame-size of the

    observed images was then redefined accordingly

    (51)

    As mentioned above, thanks to this frequency-domain tech-nique, we were able to create fractional shifts, without using 2-Dinterpolation and 2-D decimation of the images. If the sourceimages sampling rate complies with the Nyquist rate (as we

  • 350 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    Fig. 3. Left: Sources. Middle: Mixture with nonsingular coefficients.Right: Mixture with a singular coefficients matrix.

    assumed), the proposed technique is equivalent to interpola-tion and shift of the images, followed by decimation. Note thatsource images reconstruction process (described in Section IV)is the (estimated) inverse procedure of the mixtures construction(excluding the margins removal step). The proposed procedureis a general mixing process for the sources— sensors case.In the simulations, we addressed the scenario of .

    B. Synthetic Mixtures—Results

    We picked a set of two source images: an image of a fish andan image of chips (Fig. 3, left column). Both are color images,containing three color layers: red, green, and blue (RGB). Eachof these layers was individually mixed, with identical linear co-efficients and spatial-shifts. As mentioned earlier, in the estima-tion stage we first transformed the RGB mixture images intograyscale (intensity) images. Since this transformation is linear,the obtained grayscale mixture images admit the same linear-mixing and shifts model as the associated RGB components.We then subtracted the mean from each mixture (to complywith the zero-mean assumption) and applied the preprocessingedge-sharpening filter8 prior to estimation of the mixing param-eters via the 2-D AC-DC algorithm. Using the estimated mixingparameters we then reconstructed each of the color layers sep-arately.

    We first demonstrate the performance with a nonsin-gular mixing matrix . For this experiment, we used

    and .

    We applied 2-D AC-DC to the mixture, extracting initialguesses for , from the estimated correlation (as describedin Section III-C) and using the identity matrix as an initial guessfor . For the separation process we set in (40). Thesources and the mixture are presented in Fig. 3, and separationresults are presented in Fig. 4.

    Aside from some minor edge effects, both sets of color im-ages are nicely reconstructed, with no visible cross-talk betweenthe images. As explained before, the edge effects are the resultof the noncircular shifts.

    We also compared to the result of “clairvoyant” separationusing the true (inverse) mixing matrix, but ignoring the spa-tial-shifts. The reconstructed images in Fig. 4 demonstrate the

    8We used a 7� 7 filter, whose impulse response is proportional to h[n;m] =�nm exp(�(m + n )=2), jnj, jmj � 3.

    Fig. 4. Left: Zero-shift clairvoyant separation. Middle: Nonsingular case sep-aration. Right: Singular case separation.

    importance of shifts-estimation. Even when the true mixing ma-trix is known, the separation results are poor if the spatial-shiftsare not accounted for.

    In a second experiment, we simulated a shifts-only mixturescenario for the same source images, using a singular mixing

    matrix , so as to demonstrate the attainable

    separability induced by the shifts in a case that would otherwisebe inseparable. We used the same shifts matrices and the sameinitialization as in the first experiment. We stress out that the 2-DAC-DC algorithm used no a priori knowledge of this shifts-onlyscenario in estimating the mixtures (i.e., both the spatial-shiftsand the coefficients matrix were estimated from the data, as inthe case of a nonsingular coefficients matrix). In this experi-ment, we used . The mixture is presented in Fig. 3and the reconstructed images are in Fig. 4.

    As explained in Section IV, the inversion problem in the “sin-gular” scenario is inherently ill-conditioned along “singularitylines” in the frequency domain. Direct inversion of these sin-gularities enhances (along these lines) the difference betweenthe implicitly assumed circular shifts and the actual noncircularshifts, and can severely impair the reconstruction, as demon-strated in Fig. 2. We, therefore, used the MMSE reconstructionapproach (40). Note, however, that some residual artifacts arestill visible in the respective separation in Fig. 4 (right), thoughconsiderably weaker than in Fig. 2. They are (mainly) causedby this inherent inversion problem, and are not due to inaccu-rate parameters estimation in the 2-D AC-DC algorithm.

    In all these experiments the image sizes were, the correlation window size was ,

    and the 2-D AC-DC frequency-grid was of size, equally spaced in . A breakdown of the

    empirical running-times (on Matlab 6.5, 2.99GHz PC) for theseexperiments is as follows:

    • preprocessing time (edge-enhancement): 0.2 s;• estimation of correlations and spectral matrices: 6.0 s;• estimation of the mixing parameters (2-D AC-DC): around

    20 iterations, each taking 0.13 s, for a total of 2.6 s;• frequency-domain separation and image reconstruction

    (three color-layers): 3.3 s;for a total of around 12.1 s. As noted earlier, the iterative 2-DAC-DC algorithm is not the most time-consuming phase of theentire scheme.

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 351

    C. Noisy Synthetic Mixtures

    To demonstrate the separation performance in the presenceof additive noise, we also present some experiments with noisysynthetic mixtures. We applied noise to both “nonsingular”and “singular” mixtures, in a total of three experiments. WhiteGaussian noise was added9 independently to each color-layerof each mixture. The standard-deviation of the noise (out of255 levels for each color layer) was 10 in the first experiment(with a “nonsingular” mixture), and 5 and 10 (respectively) inthe second and third experiments (with a “singular” mixture).

    The mixtures, as well as the respective separated images, arepresented in Figs. 5 and 6. It is evident that although the noise isalso present in the reconstructed images, the separation is nearlyunaffected. It is to be noted that the separation scheme does notattempt to denoise the images—only separation is attempted,and successfully attained with these noise levels. For signifi-cantly higher noise levels, or for noise which is correlated be-tween the snapshots, some preprocessing denoising may be re-quired to enhance the performance, but this issue is beyond thescope of this work.

    D. Real-World Conditions Results

    To further demonstrate the performance of the algorithm,we used some real-world images reflections. We photographed(using a FUJIFILM F401, 4Mpixels Digital Camera) a roominterior (library) through a glass window. Outside the roomwe placed some object (teddy-bear, camera). The object’sreflection was superimposed on the room interior scene. Inorder to obtain spatial-shifts, the window was slightly slantedfrom one snapshot to the other. The lighting conditions werenot changed between snapshots (at least not deliberately).

    The real-life images (in Fig. 7) contain not only the sourceimages mixture, but also some natural “additive noise.” We referto any diversion from the assumed mixture model (2) (linearwith spatial shifts) as additive noise. Possible noise-sourcesare: thermal noise, quantization, saturation and more. Thethermal noise may be considered as additive white Gaussiannoise (AWGN), uncorrelated with the images. However, allother noise sources are not necessarily uncorrelated with thephotographed images, or spatially white.

    Thus, to mitigate any potential effect on the estima-tion accuracy, we incorporated some smoothing into theedge-sharpening preprocessing as follows: After subtrac-tion of the mean we applied a Gaussian filter (7 7 with

    , in Matlab) for noisereduction and then a Laplacian filter (3 3 with ,

    in Matlab) for the edge enhance-ment. We stress again, that the combined effect of these LSIfilters is merely equivalent to introduction of frequency-depen-dent weights into the AJD process. To mitigate the noise effectson the separation we employed the MMSE reconstructionscheme with (we would have employed it anyway,since the true mixture is very likely to have been singular).

    The initial guess for was ,

    reflecting the (reasonable) a priori knowledge that there was al-

    9Clipping the result to the 0-255 range when required.

    Fig. 5. Noisy mixtures. Left: “Noningular mixing”; noise standard-deviation:ten gray-levels. Middle, Right: “Singular mixing,” noise standard-deviations:five and ten ray-levels, respectively.

    Fig. 6. Separation of the noisy mixtures in Fig. 5 (respectively).

    most no change in lighting conditions between the two snap-shots. The presumed slight differences in intensities are arbi-trary and are not based on any prior knowledge. They are merelyaimed at shifting the initial guess of from singularity, sincesingularity in the first iterations sometimes slows down the con-vergence.

    For each experiment (“teddy-bear”/“camera”) we used twosnapshots. Image sizes were for the “teddy-bear” and for the “camera.” The correlationwindow size for both was , and the 2-D AC-DCfrequency-grid was of size , equally spacedin . The reconstructed images are presentedin Fig. 8. Although the separated images are noisier than thoseattained in the synthetic mixtures, the separation is easily seen tobe perceptually correct. The running time for the entire schemewas roughly the same as with the synthetic images, about 13.0 s.

    E. Separation of Dissolved Video Sequences

    Sometimes in movies, when passing from one scene to an-other, a crossfade effect is used (i.e., one scene fades-out andthe other fades-in). Our 2-D AC-DC is applicable also for sepa-rating this kind of dissolved video sequences, especially if thereis panoramic camera movement in one (or both) of the dissolvedscenes.

    Each frame of the video sequence may be considered as amixture. The th frame of the video sequence can be modeledas

    (52)

  • 352 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 3, MARCH 2008

    Fig. 7. Real-world conditions reflections—images mixture.

    Fig. 8. Real-world conditions reflections—Reconstructed images (respectivelyto Fig. 7).

    where is the number of frames comprising the effect. Theseparation algorithm may be applied to any two frames of themovie. Hence, we can divide the movie into frame pairs, andseparate each such pair, using the 2-D AC-DC algorithm.

    Quite commonly, however, the crossfade effect is (nearly)linear; i.e., the fade-in/out coefficients , change lin-early in time (that is, linearly in the frame index, ). If thecamera movement is also (nearly) linear, we may use this apriori knowledge to improve the separation results. Insteadof dividing the frames into frame pairs, and applying the 2-DAC-DC algorithm to all of the frame pairs, we can pick asmall number of frame pairs. Then, we may estimate the linearmixing coefficients and spatial shifts only for each of theseselected frame pairs. Based on the estimated mixing parametersof the selected frame pairs, we can then estimate the mixingparameters of the remaining frame pairs, using linear interpola-tion/extrapolation. In fact, this may be also done using a single

    Fig. 9. Several frames (no. 5; 15; 25; 35; 45 of 50) from the separated cross-fadevideo sequence. Upper row: Mixture. Middle row: Separated source 1. LowerRow: Separated source 2.

    selected pair. Using this simplified procedure, we separated asynthetically generated dissolved video sequence.

    A sequence of five representative frames (out of 50 framescomprising the entire sequence) of the mixed and separated se-quence appears in Fig. 9. The full video sequences can be down-loaded/viewed (in the form of an file using Cinepak com-pression) at [18].

    VII. CONCLUSION

    This work addressed the problem of blind separation of imagereflections. In most of the related BSS algorithms, some kindof diversity between the observed images is exploited in orderto attain separation. Common diversity sources are polarizationand change in lighting conditions. In this work, we introducedan additional (newly considered) source of diversity, in the formof relative spatial-shifts. Relative spatial-shifts of the sourcesbetween snapshots may be an unintentional outcome of move-ment of either the reflective medium or recorded objects. Alter-natively, they may be introduced deliberately, e.g., by slightlyslanting the window between snapshots (the tilt-angle must bekept small, so as not to deform the shifted scene by contractiondue to projection).

    The 2-D AC-DC algorithm is an integrated method to esti-mate both the mixing coefficients and the spatial-shifts. Sinceit exploits more than one source of diversity, the algorithm per-forms reasonably well even when either one of these diversitysources is deficient.

    We formulated the problem of blind separation of shiftedand linearly mixed images, as a specially parameterized AJDproblem of 2-D spectra matrices. AJD was applied at all fre-quencies together, with one target-matrix at each frequency,where the diagonalizing matrices obey the model-prescribedfrequency dependence. Thus, our approach is free of fre-quency-dependent permutation/scale ambiguities, sometimesencountered in other frequency-domain approaches (for generalconvolutive mixtures).

    The underlying “working assumption” of our approach is thatthe sources are WSS and mutually uncorrelated. The stationarityassumption is usually quite unrealistic for images. Fortunately,however, for our approach, the only truly essential stationarityis with respect to the zero cross-correlations; stationarity withrespect to the autocorrelations is not as important.

    We demonstrated the successful performance of the proposedalgorithm both with synthetic mixtures and with real-life reflec-

  • BE’ERY AND YEREDOR: BLIND SEPARATION OF SUPERIMPOSED SHIFTED IMAGES 353

    tion images. Although admittedly not suitable for the reflectionremoval problem in its most general form, we believe that ourmodel and associated solution approach can successfully covera variety of restrictive, yet realistic scenarios.

    REFERENCES

    [1] E. Be’ery and A. Yeredor, “Blind separation of reflections with relativespatial shifts,” in Proc. ICASSP, 2006, pp. V625–V628.

    [2] A. Levin, A. Zomet, and Y. Weiss, “Separating reflections from a singleimage using local features,” in Proc. CVPR, 2004, vol. 1, pp. 306–313.

    [3] K. I. Diamantaras and T. Papadimitriou, “Blind separation of reflec-tions using the image mixtures ratio,” in Proc. Int. Conf. Image Pro-cessing, 2005, pp. 1034–1037.

    [4] Y. Y. Schechner, J. Shamir, and N. Kiryati, “Polarization and statisticalanalysis of scenes containing a semireflector,” J. Opt. Soc. Amer., vol.17, no. 2, pp. 276–284, 2000.

    [5] A. M. Bronstein, M. M. Bronstein, M. Zibulevski, and Y. Y. Zeevi,“Blind separation of reflections using sparse ICA,” in Proc. Int. Work-shop on Independent Component Analysis and Blind Signal Separa-tion, 2003, pp. 227–232.

    [6] H. Farid and E. H. Adelson, “Separating reflections from images usingindependent component analysis,” J. Opt. Soc. Amer., vol. 16, no. 9,pp. 2136–2145, 1999.

    [7] T. W. Cronin, N. Shashar, and L. Wolff, “Portable imaging polarime-ters,” Proc. ICPR A, pp. 606–609, 1994.

    [8] Y. Y. Schechner, N. Kiryati, and R. Basri, “Separation of transparentlayers using focus,” Int. J. Comput. Vis., vol. 39, no. 1, pp. 25–39, 2000.

    [9] S. Shwartz, Y. Y. Schechner, and M. Zibulevsky, “Efficient separationof convolutive image mixtures,” in Proc. ICA, 2006, pp. 246–253.

    [10] M. Castella and J.-C. Pesquet, “An iterative blind source separationmethod for convolutive mixtures of images,” in Proc. ICA, 2004, pp.922–929.

    [11] A. M. Bronstein, M. M. Bronstein, and M. Zibulevski, “On separationof semitransparent dynamic images from static background,” in Proc.ICA, 2006, pp. 934–940.

    [12] Y. Tsin, S. B. Kang, and R. Szeliski, “Stereo matching with linear su-perposition of layers,” IEEE Trans. Pattern Anal. Mach. Learn., vol.28, no. 2, pp. 290–301, Feb. 2006.

    [13] R. Szeliski, S. Avidan, and P. Anandan, “Layer extraction frommultiple images containing reflections and transparency,” in Proc.IEEE Conf. Computer Vision and Pattern Recognition, 2000, vol. 1,pp. 246–253.

    [14] A. Yeredor, “Blind source separation with pure delays mixture,” inProc. ICA, 2001, pp. 522–527.

    [15] A. Yeredor, “Non-orthogonal joint diagonalization in the least-squaressense with application in blind source separation,” IEEE Trans. SignalProcess., vol. 50, no. 7, pp. 1545–1553, Jul. 2002.

    [16] S. M. Kay, Fundamentals of Statistical Signal Processing. Engle-wood Cliffs, NJ: Prentice-Hall, 1993.

    [17] Y. Y. Schechner, J. Shamir, and N. Kiryati, “Vision through semi-re-flecting media: Polarization analysis,” Opt. Lett., vol. 24, no. 16, pp.1088–1090, 1999.

    [18] [Online]. Available: http://www.eng.tau.ac.il/~ arie/imsep.htm

    Efrat Be’ery was born in Petach-Tikva, Israel,in 1981. She received the B.Sc. degree (magnacum laude) in electrical engineering and the M.Sc.(magna cum laude) degree from Tel-Aviv University,Tel-Aviv, Israel, in 2005 and 2007, respectively.

    Since 2006, she has been with POLYCOM, Inc.,Petach-Tikva, as an Algorithm Engineer in the fieldsof video processing and audio processing.

    Arie Yeredor (M’99–SM’02) Was born in Haifa, Is-rael in 1963. He received the B.Sc. degree in elec-trical engineering (summa cum laude) and the Ph.D.degree from Tel-Aviv University (TAU), Tel-Aviv, Is-rael, in 1984 and 1997, respectively.

    He is currently a Senior Lecturer in the Departmentof Electrical Engineering—Systems, School of Elec-trical Engineering, TAU, where he teaches coursesin statistical and digital signal processing. He alsoholds a consulting position with NICE Systems, Inc.,Ra’anana, Israel, in the fields of speech and audio

    processing, video processing, and emitter location algorithms. His research in-terests include estimation theory, statistical signal processing, and blind sourceseparation.

    Dr. Yeredor has been awarded the yearly Best Lecturer of the Faculty of En-gineering Award six times by TAU. He is an Associate Editor for the IEEESIGNAL PROCESSING LETTERS and the IEEE TRANSACTIONS ON CIRCUITS ANDSYSTEMS—II: EXPRESS BRIEFS, and a member of the Signal Processing So-ciety’s Signal Processing Theory and Methods Technical Committee.