A Markov Random Field model of contamination source identi

A Markov Random Field model of contamination source

identification in porous media flow

Jingbo Wang and Nicholas Zabaras1

Materials Process Design and Control Laboratory, Sibley School of Mechanical andAerospace Engineering, 188 Frank H.T. Rhodes Hall, Cornell University, Ithaca, NY 14853-3801, USA

Abstract

A contamination source identification problem in constant porous media flow is addressed by solving the advection-dispersion

equation (ADE) with a hierarchical Bayesian computation method backward through time. The contaminant concentration

is modeled as a pair-wise Markov Random Field (MRF) and the distribution is updated using current concentration mea-

surements at finite locations. Hierarchical Bayesian analysis is used to derive the posterior distribution of the contaminant

concentration at past time points. The posterior mean estimate is computed using a modified single-component Gibbs algo-

rithm. The methodology is first tested via examples of contaminant identification in a homogeneous porous medium using

both diffusion-dominated and convection-dominated conditions. A heterogeneous porous media flow case is also examined.

In all the numerical studies reported, the anisotropic dispersion effect is considered. It is verified that the MRF model

can effectively model the spatial correlation of the concentration field, and the presented approach can provide accurate

solutions to the ill-posed inverse problem.

Keywords: Porous media; Advection; Dispersion; Pollution source; Source identification; Bayesian inference; Inverse Prob-

lems; Markov Random Field

1 Introduction

The contamination source identification problemhas received significant research interest due toits applications in groundwater and soil cleanup.Addressing this problem requires solving the par-tial differential equations (PDEs) of contaminantpropagation in porous media flow backwards intime. Namely, the objective is to compute thehistory of contaminant concentration from cur-rent concentration data. The ill-posedness ofthis inverse problem and the difficulties in sim-ulating the contaminant propagation have beenwell-recognized. To facilitate the solution to thischallenging problem, a variety of methods havebeen developed over the past several decades,which have been reviewed by Atmadja and Bagt-zoglou [1] and Michalak and Kitanidis [2].

Among the solution methods considered,Bayesian computation has a number of distinctiveattributes. The Bayesian approach restates theill-posed inverse problem as a well-posed problemin an expanded stochastic space. Instead of com-puting only deterministic point estimates, it com-putes the probability distribution (often calledposterior distribution) of contaminant concentra-tion history conditional on current concentrationdata. The Bayesian approach can quantify ran-dom errors in the data and uncertainties in theconcentration field. The posterior distribution iscomposed of the likelihood, which is the probabil-ity of the data given the concentration history, andthe prior distribution, which models the regularityof the unknown concentration field. Incorporationof the prior distribution allows flexible regulariza-tion and accurate estimation from sparse data.

The Bayesian approach was introduced forsolving contamination source identification by

Snodgrass and Kitanidis [3]. In [4, 5], Micha-lak and Kitanidis developed a confined Brownianmotion model to enforce nonnegativity of concen-tration estimates and to select structure parame-ters of the Bayesian posterior distribution. In ourwork, a pair-wise Markov Random Field is intro-duced to regularize the prior distribution of con-centration history. A hierarchical Bayesian analy-sis is introduced, allowing the uncertainty (struc-ture) parameters to be quantified as part of theposterior distribution and to be explored using amixed Metropolis-Hastings (MH) algorithm [6].

In contrast to the relatively straightforwardformulation of the likelihood, modeling the priordistribution in the Bayesian approach is more diffi-cult. Considering the structure of a spatially vary-ing concentration field, a pair-wise MRF modelis introduced from spatial statistics [7]. TheMRF models the spatial correlation among con-centration variables at adjacent locations, whichsmoothes the concentration field. This modelhas been used in a variety of spatial applica-tions [8, 9]. Very recently, the model was intro-duced to inverse heat transfer problems by Wangand Zabaras [10, 11], where time is treated asan extra spatial dimension in modeling dynamicfields. In [11], the canonical MRF model was en-hanced to model random fields with different tem-poral and spatial length scales and to resolve dis-continuities in the unknown fields.

In this work, in addition to the concentrationfield, the standard deviation of measurement er-rors and the scaling parameter of the prior distri-bution are also treated as random variables. Werefer to these parameters as ‘structure variables’following the terminology by Michalak and Kitani-dis [4]. These parameters are often called hyper-parameters in Bayesian statistics. Hierarchical

1

Nomenclature

A acceptance probability Greek symbols

c concentration α Gamma distribution parameterc known concentration at well αm molecular diffusivityc0 initial concentration αl longitude dispersion coefficientD dispersion tensor αt transverse dispersion coefficientf linear FE basis function β Gamma distribution parameterH sensitivity matrix γ scaling constant of MRFI identity matrix Γ Gamma functionK permeability θ parameter form of unknown c

L length of Markov chain θ estimate of θ

L2 square-integrable function space λ scaling constant of MRFm dimension of θ µ dynamic viscosityN total number of data ξ parameter of hyper-priorn unit normal direction vector σ standard deviation of ωm

p pressure τ upwind parameterprobability density function φ porosity

Pe local Peclet number Φ kernel function of MRFq volume flux rate ωm random measurement error

proposal distribution Ω spatial domaint time Ωe finite element domain∆t time step size ∂Ω spatial domain boundaryT maximum time of simulationu random number Superscripts

u Darcy velocity T transposeU uniform distribution (i) ith iteration or ith time stepw test function of concentration (*) candidatew test function of velocity Subscripts

W precision matrix i ith componentx spatial vector -i full conditionalY concentration data vector i ∼ j site neighborhood

Bayesian analysis [12] is used to derive the jointdistribution of structure parameters with the un-known concentration field. The joint posteriorstate space is then explored using a mixed Markovchain Monte Carlo (MCMC) sampler that samplesthe concentration variables using the Gibbs algo-rithm [13] and the structure parameters using theMH algorithm [14].

Simulation data are used in this study totest the presented inverse computation method.The Darcy equation for porous media flow is firstsolved using the global gradient post-processingmethod (Loula et al. [15]). The velocity fieldis then used to solve the advection-dispersionequation of concentration using a streamline-upwind/Petrov-Galerkin (SUPG) finite elementmethod. All equations are solved on a rather finegrid to generate simulation data, thus avoiding theso called ‘inverse crimes’ [16].

The rest of this article is organized as follows.Section 2 introduces the mathematical definitionof the problem. The direct simulation of the con-tamination propagation is discussed in Section 3.It is then followed by the hierarchical Bayesianformulation of the inverse computation. The pos-terior exploration algorithms are presented in Sec-

tion 5. Section 6 contains numerical examplesto demonstrate the developed methodology. Fi-nally, conclusions of the current work and futureresearch issues are summarized in Section 7.

2 Problem definition

Propagation of contaminant in an impermeableporous medium can be described by the followingadvection-dispersion equation (ADE) [17]:

φ∂c

∂t+∇·(cu)−∇·(D∇c) = cq, in Ω×(0, T ], (1)

with prescribed initial and (Neuman) boundaryconditions,

c(x, 0) = c0(x), in Ω, (2)

D∇c · n = 0, on ∂Ω × (0, T ]. (3)

In the above equations, c is the concentration(mass fraction) of the contaminant and c is theprescribed concentration values at the injectionand production wells. Also q denotes the volumeflux rate at the wells and φ and D are the mediumporosity and dispersion tensor, respectively. Fi-nally, Ω is the spatial domain and (0, T ] is the

2

total time span. The anisotropic dispersion coef-ficient D can be modeled as follows:

D = φαmI + ‖u‖[αlE(u) + αtE⊥(u)], (4)

with

E(u) =1

‖u‖2u ⊗ u, E⊥(u) = I − E(u), (5)

where I is the identity matrix, and αm, αl andαt are the molecular diffusivity, longitudinal dis-persion coefficient and transverse dispersion coef-ficient, respectively. The Darcy velocity u can becomputed from the following equations:

∇ · u = q, in Ω × (0, T ], (6)

u = −K(x)

µ(c)∇p, in Ω × (0, T ], (7)

u · n = 0, on ∂Ω × (0, T ], (8)

where p is the hydrodynamic pressure and K andµ are the permeability and dynamic viscosity, re-spectively. In this study, we assume the variationof viscosity can be neglected, i.e. µ is a constantequal to the dynamic viscosity of the resident fluid(water). Therefore, the ADE Eq. (1) is decoupledfrom the flow Eqs. (6) and (7).

A direct (or forward) contaminant propaga-tion problem is defined as the computation of theconcentration distribution at all times t ∈ (0, T ],given initial condition Eq. (2) and boundary con-dition Eq. (3). In the inverse problem of inter-est considered, the contamination concentrationat current time t = T can be measured at finitelocations inside Ω. However, the history of conta-minant distribution is not known. Namely, c0 andthe time span T between releasing time t = 0 ofthe contaminant and measurement time t = T areboth unknown. The inverse problem is to com-pute the concentration backward in time, namelyc(t) with t < T , on a finer scale grid than themeasurement scale grid. It is assumed that noprior knowledge of releasing time and location ofthe contaminant is available. The releasing timeis defined as the time point corresponding to abackward computed concentration of 1.0 at anylocation.

3 The direct simulation and

sensitivity analysis

3.1 Solution to the flow equations

Solution to the direct problem is required for theinverse computation. The direct simulation canbe separated into two parts: solution to the flowequations and solution to the concentration equa-tion. In the first part, the constant flow velocityfield is obtained by solving Eqs. (6) to (8). Thevelocity is then used in solving Eqs. (1)-(5).

In the context of the finite element (FE)method, the most common approaches to solvingthe flow Eqs. (6)-(8) are the stabilized finite el-ement method, in which the pressure and veloc-ity are determined simultaneously, and the gra-dient post-processing method, in which the pres-sure is found first and then the velocity is cal-culated via gradient post-processing. The gradi-ent post-processing method is easier to implementand computationally less costly. It solves a diffu-sion equation derived by substituting Eq. (7) intoEq. (6) for pressure first. The velocity is thencomputed as the smoothed gradient of the pres-sure field. In this work, the flow equations aresolved using a global post-processing method asdiscussed in [15].

In the gradient post-processing approach, thepressure is solved using

∇ ·(

K

µ∇p

)

= −q, (9)

which is derived by substituting Eq. (7) intoEq. (6). The finite element technique to solve thissteady state diffusion equation is trivial. Once thepressure field is obtained, Eq. (7) can be used tocompute the velocity. However, since the finite el-ement solution of pressure is usually not smoothacross element boundaries, the velocity obtainedby directly computing the gradient of pressure isdiscontinuous across element boundaries, whichis not physically feasible. To achieve a continu-ous velocity solution, a global L2-smoothing post-processing problem is usually solved with the fol-lowing weak formulation:

(u,w) = (−K

µ∇p,w), (10)

where (·, ·) is the L2(Ω) inner product and w isthe test function for velocity. According to Loulaet al. [15], to further increase the accuracy of thepost-processing result, Eq. (10) is often modifiedas:

(u,w) + (δh)α(∇ · u,∇ · w) = (−K

µ∇p,w)

+(δh)α(q,∇ · w), (11)

in which h is the finite element grid size. Theparameters δ and α are here taken as 0.1 and 1,respectively [15]. Let

U = u|u ∈ (L2(Ω))dim,∇ ·u ∈ L2(Ω),u ·n = 0.(12)

The problem can be stated as to find u ∈ U , suchthat, for all w ∈ U , Eq. (11) holds.

3.2 Solution of the concentration

equation

After computing the velocity from the above ap-proaches, one can evaluate the concentration us-ing Eq. (1). To solve this advection-dispersion

3

equation, the SUPG finite element formulation isused [19]:

∫

Ω

φ∂c

∂twdΩ +

∫

Ω

(u · ∇c)wdΩ +

∫

Ω

qcwdΩ

+

∫

Ω

D∇c ·∇wdΩ+

nel∑

e=1

∫

Ωe

τue∇w(φ∂c

∂t+ue ·∇c

+qc)dΩe =

∫

Ω

cqwdΩ +nel∑

e=1

∫

Ωe

τue∇wcqdΩe.

(13)where w is the test function for concentration.The weak problem is to find c ∈ H1(Ω) such that,for all w ∈ H1(Ω) Eq. (13) holds. The elementbased integrals (the 5th term on the left hand sideand the 2nd term on the right hand side) are theSUPG stabilizing terms, in which τ is the upwindparameter. In this formulation, it is assumed thatthe gradient of the test function w is discontinuousacross element boundaries. The stabilization pa-rameter τ is computed via the following formula:

τ =1

2

h

‖ue‖min(

Pe

3, 1.0), (14)

where Pe is the local (element) Peclet numberthat is defined as:

Pe =1

2h

‖ue‖3

uTe Due

. (15)

With the finite element formulations intro-duced above, the direct problem is solved usingtwo-dimensional bi-linear finite elements. Thesimulator was implemented for parallel machinesusing PETSc [18] and has been tested by compar-ing the results to solutions of various numericalexamples documented in [19, 20, 21].

3.3 Sensitivity analysis

A discussion of sensitivity analysis is necessary toimprove understanding of the Bayesian formula-tion. To present the sensitivity analysis, a sim-pler inverse problem is temporarily considered inthis section. By assuming a known releasing timeof contaminant, the inverse problem introduced inSection 2 is reduced to the estimation of a spatiallyvarying function c0(x). This function estimationproblem is further transformed into a parameterestimation problem by introducing the followingapproximation:

c0(x) =

m∑

j=1

θjfj(x), (16)

where fj(x)’s are the linear finite element basisfunctions and θj ’s are the nodal values of finiteelement approximation of c0. The problem nowis to estimate an unknown m-dimensional vector

θ with θ(j)=θj being the nodal value associatedwith the jth basis function.

Let c(x, T ) be the concentration at measure-ment time t = T that is computed from Eq. (1)using c0 as initial condition. Due to the linearityof the direct problem,

c(x, T ) =

m∑

j=1

θjcj(x, T ) (17)

with cj(x, T ) being the direct solution of concen-tration at t = T using fj(x) as the initial condi-tion.

Let the N -dimensional vector Y denote theconcentration measurement data at t = T withY (i) being the measurement at the ith sensor loca-tion (xi) and N being the total number of sensors.Furthermore, let C be the N -dimensional vectorwith C(i) = c(xi, T ). Using Eq. (17), C can berepresented as:

C = Hθ, (18)

in which H is a N × m matrix with H(i, j) =cj(xi, T ). H is often called the sensitivity matrix.It component H(i, j) reflects the sensitivity of theconcentration C(i) at each sensor location i withrespect to small variations in the parameter θ(j).

In the remaining part of this article, the fol-lowing system relationship is assumed:

Y = Hθ + ω, (19)

where ω denotes the error between the measureddata and the concentration computed using thetrue initial condition. Therefore, ω contains boththe random measurement error and the numericalerror. The objective of the simpler inverse prob-lem is to find an estimate of θ such that the dis-crepancy between Y and C is minimized in somesense.

With the capability to simulate the direct prob-lem and compute the sensitivity matrix, we arenow ready to investigate the Bayesian formula-tion.

4 Bayesian backward compu-

tation

4.1 Bayesian inverse formulation

For the parameter estimation problem in Eq. (19),Bayesian inference computes the probability den-sity function of random unknown θ conditional onthe data Y using the Bayes’ formula:

p(θ|Y ) =p(Y |θ)p(θ)

p(Y ), (20)

where the conditional density function p(Y |θ) isthe likelihood and p(θ) is the prior density func-tion. The conditional density function p(θ|Y )

4

is often called the posterior probability density.The difference between the Bayesian approach andother inverse methods is that the Bayesian ap-proach determines the distribution of unknownparameters instead of point estimates, and as aresult, the inverse problem is formulated as a well-posed problem in a stochastic space (state spacedefined by the prior distribution). Even in seek-ing point estimates, the Bayesian approach en-ables more flexible regulation than other methods.Based upon the posterior distribution, credible in-tervals at arbitrary levels can be computed for allBayesian point estimates, which is another advan-tage of the Bayesian approach.

Despite the fact that a number of point esti-mates can be computed according to different lossfunction definitions, the posterior mean estimate(minimum mean square error (MMSE) estimate)is deemed a reliable estimate and is computed inthe current study as the inverse solution. The pos-terior mean estimate is defined as:

θpostmean = E θ|Y (21)

There are two points worth emphasizing here:first, it is only meaningful to study the probabilityof a random variable existing within an interval,rather than having a particular value; therefore,estimation of the distribution makes more practi-cal sense that computing point estimates. Second,the actual value of the parameter θ is fixed insteadof being random. The rationality in modelling itas a random variable is that it is inferred from thenoise-polluted data; hence, uncertainties exist inour knowledge of this quantity.

As seen from Eq. (20), to obtain the posteriordistribution, one needs to compute p(Y ), p(Y |θ)and p(θ). However, since Y is the known data, therole of p(Y ) is nothing more than a normalizingconstant in the posterior distribution. It is notneeded to compute the posterior estimates usingsampling algorithms introduced later in this arti-cle. Therefore, it is enough to evaluate Eq. (20)as follows:

p(θ|Y ) ∝ p(Y |θ)p(θ). (22)

If the random errors in Eq. (19) are assumedto be independent identically distributed (i.i.d.)Gauss random noise with zero mean and variancev (standard deviation σ =

√v), the likelihood can

be formulated as:

p(Y |θ) =1

(2π)N/2vN/2exp− (Y − Hθ)T (Y − Hθ)

2v.

(23)It should be noticed that even though other distri-butions can be used to model random errors, theGaussian distribution is the most commonly usedmodel. It is well recognized that in large data setwith random errors, the Gaussian distribution fitsquite well the actual distribution.

To obtain the posterior distribution, the nexttask is to formulate the prior distribution p(θ).

4.2 Markov Random Field as prior

distribution

The prior distribution represents the distributioninformation of θ in advance of Bayesian inference.It defines a prior state space with a certain degreeof regularity that cures the ill-posedness of theinverse problem. There are a number of standardtechniques to model the prior distribution, some ofwhich are known as conjugate priors and Jeffery’spriors [12].

For the main unknown θ in the current prob-lem, a pair-wise Markov Random Field model isselected as the prior. The consideration is that θ

represents the concentration values at a large setof spatial locations (finite element nodes), whilethe pair-wise MRF model is designed for modelingthe correlation between spatially adjacent randomvariables on finite lattice system [7]. By model-ing the correlation among spatially indexed ran-dom variables θi’s, regularity is posed on the statespace of θ.

The general form of the pair-wise MRF is asfollows:

p(θ) ∝ exp−∑

i∼j

WijΦ(γ(θi − θj)) (24)

where γ is a scaling parameter, Φ(·) is an evenfunction that determines the specific form of theMRF, the summation is over all pairs of sites i ∼ j

that are defined as neighbors, and Wij are speci-fied non-zero weights. Let Φ(u) = 1

2u2, the MRFused herein is of the form

p(θ) ∝ λm/2 exp(−1

2λθT Wθ). (25)

In the above one-parameter model, the entries ofthe m×m matrix W are determined as Wij = ni ifi=j, Wij = −1 if sites i and j are adjacent (termedas neighbor sites) and as 0 otherwise. The variableni denotes the number of neighbors adjacent tosite i and λ is a scaling parameter.

There are several benefits of the MRF modelEq. (25). First of all, it is shift invariant, namely,p(θ) = p(θ + b) with b being a vector with iden-tical components. This ensures that the prior dis-tribution will not over constrain the unknown pa-rameter. In addition, Eq. (25) is much less com-putationally expensive to explore than a full-entryGauss Random Field. Furthermore, the methodis rather flexible in modeling various correlationstructures by changing the order of Markov fieldand weight specification.

It is important to mention that the above pair-wise MRF model can be easily extended to the dis-continuity adaptive MRF model that enables au-tomatic resolution of discontinuities in unknown

5

random fields. Interested readers are encouragedto consult our previous study in [11].

With the likelihood Eq. (23) and prior distri-bution Eq. (25), the posterior can be tentativelywritten as:

p(θ|Y ) ∝ exp− (Y − Hθ)T (Y − Hθ)

2v

· exp(−1

2λθT Wθ). (26)

However, this posterior distribution depends onpre-fixed values of v and λ. In reality, the mag-nitude of the actual noise can only be roughlyestimated. Selection of λ is even more non-deterministic. These two structure parameters arekey to the estimation of the posterior distributionand to the degree of smoothness of all point es-timates. Unlike earlier methods that try to se-lect such structure parameters before the inversecomputation, a hierarchical Bayesian approach isconsidered here to estimate the distribution ofthe structure parameters simultaneously with thecomputation of the concentration distribution.

4.3 The hierarchical posterior dis-

tribution

The hierarchical Bayesian posterior is usually usedwhen the prior distribution of the primary un-known variables (e.g. of the concentration historyin the current problem) depends on some uncer-tain parameters, also known as hyper-parametersor structure parameters. The canonical form of ahierarchical posterior can be written as:

p(θ, ξ|Y ) ∝ p(Y |θ)p(θ|ξ)p(ξ), (27)

where ξ is the structure parameter and p(ξ) isthe prior distribution of ξ. It is assumed inthis formulation that the likelihood solely dependson the primary parameter θ. The advantagesof using Bayesian hierarchical posterior are: (i)the uncertainty of hyper-parameters is quantified,and (ii) the effect of poor knowledge of structureparameters on the posterior is reduced so thatthe posterior estimates are less likely to be over-constrained.

The structure parameter is generally assumedto have a nearly non-informative distribution overits support. For instance, in the current exam-ple, the structure parameters λ and v are bothassumed a priori to be nearly uniformly distrib-uted over (0,∞]. However, the functional form ofthe nearly non-informative prior varies for differ-ent structure parameters. In this study, conjugatepriors [12] are used to model the prior distributionof λ and v.

Conjugate priors are the prior distributionsthat ensure the corresponding posterior distrib-utions having the same functional form as the pri-ors. For Eq. (26), Gamma and inverse Gamma

distributions are chosen as priors for λ and v, re-spectively:

p(λ) ∝ βα1

1

Γ(α1)λα1−1e−β1λ, λ ∈ (0,∞] (28)

p(v−1) ∝ βα2

2

Γ(α2)v−(α2+1)e−β2v−1

, v ∈ (0,∞]

(29)where Γ(·) is the standard Gamma function. Asmall value 1.0e−3 is selected for the Gamma dis-tribution constants α1, α2, β1 and β2. Thus, thedistributions Eqs. (28) and (29) are nearly non-informative over (0,∞].

With the hyper priors defined above, a hierar-chical Bayesian posterior distribution can be com-puted as follows:

p(θ, λ, v|Y ) ∝ p(Y |θ, v)p(θ|λ)p(λ)p(v)

∝ v−N/2 exp− (Y − Hθ)T (Y − Hθ)

2v

·λm/2 exp−1

2λθT Wθ

·λα1−1 exp−β1λv−(1+α2) exp−β2v−1,

λ ∈ (0,∞] ∩ v ∈ (0,∞]. (30)

4.4 The backward marching scheme

Equation (30) models the posterior distributionof the initial concentration field when T is known.In the problem of interest in this study, T is anunknown variable as well. Therefore, a backwardmarching scheme is used to reconstruct the entirehistory of the concentration field. The procedureis as follows:

1. Select a time step ∆t and set T = ∆t andt = tcurrent where tcurrent is the current time(arbitrary reference time).

2. Formulate a posterior in the form of Eq. (30)at t = tcurrent − T .

3. Compute the posterior mean estimate of con-centration at t.

4. If the computed concentration reaches a value1 at any location, exit the iteration. Else, setT = T + ∆t and return to step 2.

In this approach, the concentration prior to themeasurement time is reconstructed backward intime until the releasing time is reached. Note thatthe sensitivity problems only need to be solvedonce in this approach (solve the sensitivity prob-lem over a large time span and record concentra-tion values at sensor locations at all time steps).

Computing integrals of the hierarchical pos-terior distribution Eq. (30) is not a trivial task.More importantly, one is often interested in thehighest density region of the posterior distri-bution. Based upon these considerations, aGibbs sampling based Markov chain Monte Carlo(MCMC) simulation method is used to computethe posterior mean estimate of concentration.

6

5 Numerical exploration of

the posterior distribution

Monte Carlo simulation is used in this study tocompute the posterior estimates of concentration.The estimator in Eq. (21) is approximated as:

EL(θ|Y ) =1

L

L∑

i=1

θ(i), (31)

where θ(i)’s (i=1:L) are L samples randomly gen-erated from the posterior distribution. By the lawof large numbers, this sample mean converges tothe true expectation as L goes to infinity. Hence,a large enough sample set will ensure the accuracyof the Monte Carlo approximation.

Key to Monte Carlo simulation is how to gen-erate a reasonably large sample set from the highdimensional distribution Eq. (30). Markov chainMonte Carlo (MCMC) simulation algorithms gen-erate samples from the posterior distribution whilespending most of the sampling steps in the highestdensity regions of the posterior state space.

The essence of MCMC algorithms is to explorethe state space of a random parameter using theMarkov chain mechanism [6]. One of the key ad-vantages of using MCMC is that one can drawsamples even if p(x) can only be evaluated up tothe normalizing constant. The MCMC samplerdesigned for exploring Eq. (30) is based on thebasic MCMC algorithm, the Metropolis-Hastingsalgorithm and the Gibbs algorithm. The pseudo-code is the following:

1. Initialize θ(0), λ(0) and v(0)

2. For i = 0 : Nmcmc − 1

— sample θ(i+1)1 ∼ p(θ1|θ

(i+1)−1 , λ(i), v(i), Y )

— sample θ(i+1)2 ∼ p(θ2|θ

(i+1)−2 , λ(i), v(i), Y )

—...

— sample θ(i+1)m ∼ p(θm|θ

(i+1)−m

, λ(i), v(i), Y ).

— sample u ∼ U(0, 1)

— sample λ(∗) ∼ qλ(λ(∗)|λ(i))

— if u < A(λ(∗), λ(i))

λ(i+1) = λ(∗)

— else

λ(i+1) = λ(i),

— sample u ∼ U(0, 1)

— sample v(∗) ∼ qv(v(∗)|v(i))

— if u < A(v(∗), v(i))

v(i+1) = v(∗)

— else

v(i+1) = v(i).

In the above algorithm, Nmcmc is the to-tal number of sampling steps, and θ(i), λ(i),and v(i) are the samples generated in the ith

iteration for θ, λ, and v, respectively. Also,

θ(i)j is the jth component of θ(i). The no-

tation θ(i+1)−j denotes a m − 1 dimensional

vector θ(i+1)1 , ..., θ

(i+1)j−1 , θ

(i)j+1, ..., θ

(i)m . Also,

p(·|θ(i+1)−j , λ(i), v(i)) is the full conditional distri-

bution of θj in the ith iteration and u is a randomnumber generated from the standard uniform dis-tribution U(0, 1). Finally, qλ(·|λ(i)) and qv(·|v(i))are the proposal distributions for λ and v in theith iteration, respectively.

This algorithm updates one component of θ

at each sampling step. The proposal distributionfor each component θj is its full conditional dis-tribution, which is the propability distribution ofθj conditional on all other components. This fullconditional distribution is derived as follows:

p(θj |θ−j , λ, v, Y ) ∼ N(µj , σ2j ), (32)

µj =bj

2aj, σj =

√

1

aj, (33)

aj =

N∑

s=1

H2sj

v+ λWjj , bj = 2

N∑

s=1

µsHsj

v− λµp,

(34)

µs = Ys −∑

t6=j

Hstθt, µp =∑

i6=j

Wijθi +∑

k 6=j

Wjkθk

(35)The acceptance probability of every Gibbs sampleis 1; hence all the samples of θ generated in thisway are accepted.

The second part of this algorithm uses an MHsampler to update λ and v. The proposal distri-butions used to generate new samples of λ and v

are both normal distributions as follows:

qλ(λ(∗)|λ(i)) ∼ N(λ(i), σ2λ), (36)

andqv(v(∗)|v(i)) ∼ N(v(i), σ2

v). (37)

There are two notes to this sampling process.First, the physical limits of θ are 0 and 1. How-ever, if such limits are posed to the samplingprocess (i.e. rejecting negative samples and sam-ples greater than 1) the posterior mean estimatecan never reach the limits, which causes biased-ness. Therefore, in this study, no constraint is ap-plied to the sampling process. By doing this, somephysically unfeasible samples are generated, butthe posterior mean estimates are feasible. Second,design of the proposal distributions qλ(λ(∗)|λ(i))and qv(v(∗)|v(i)) must ensure that the effective reg-ularization parameter ( 1

2λσ2) is not too large.Once the designed Markov chain converges,

the samples recorded thereafter are from the tar-get posterior distribution Eq. (30). The posterior

7

mean estimates can then be computed using thesesamples.

6 Numerical examples

In this section, the above introduced methodol-ogy is demonstrated via three numerical examples.Without loss of generality, the examples are stud-ied in dimensionless form.

6.1 Example 1: 1D advection-

dispersion in a homogeneous

porous medium

The first example is a one-dimensional problemadopted from [1]. Inside the spatial domain [0, 28],Eq. (1) holds with unit constant velocity, porosityand dispersion coefficient (u = 1.0, φ=1.0, D =1.0). The concentration values at x = 0 and x =28 are kept at zero. The initial concentration is arectangular pulse:

c0(x) =

1, 13.5 ≤ x ≤ 14.50, 0 ≤ x ≤ 13.5, 14.5 ≤ x ≤ 28

(38)The concentration data are collected at t = 2.0,while the objective is to estimate the concentra-tion at t = 1.1 and t = 1.9.

0 5 10 15 20 25 30−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

x

Co

nce

ntr

atio

n,

C

true concentrationposterior mean estimate

Figure 1: True and posterior mean estimate of con-centration at t = 1.1.

Following [1], the direct problem is solvedon a grid with 112 elements with time step 0.02.The true and estimated concentration profiles att = 1.1 and 1.9 are plotted in Figs. 1 and 2, respec-tively. It is noticed that the estimate at t = 1.1is slightly better than the estimate at t = 1.9,which is not expected since the data are collectedat t = 2.0. This is due to the fact that the poste-rior mean estimate is affected by the sample valuesof λ. Since the samples at these two time pointswere drawn from separate chains, the difference insample values of λ causes the larger estimate er-ror at t = 1.9. However, these two estimates areboth rather accurate. This example is different

0 5 10 15 20 25 30−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

x

Co

nce

ntr

atio

n,

C

true concentrationposterior mean estimate

Figure 2: True and posterior mean estimate of con-centration at t = 1.9.

1000 3000 5000 7000 90000.0000

0.0001

0.0002

0.0003

0.0004

0.0005

Figure 3: Posterior density of structure parameter λ

in obtaining concentration estimate at t = 1.1.

from the one in [1] in that (i) the concentrationdata are assumed to be measured at 27 locationsinstead of at all element nodes and (ii) randomnoise with magnitude of 5% of the true concen-tration is added to the data. Still the estimatesare quite accurate. The posterior density of thestructure parameter λ in obtaining concentrationestimate at t = 1.1 is plotted in Fig. 3 (Gammadistribution).

6.2 Example 2: 2D concentration

reconstruction

In the following examples, we simulate aquarter area of the classical 5-spot problem. Aschematic of the problem is shown in Fig. 4. In-side the square domain (unit side length), Eqs. (1)through (8) hold. The injection and productionwells are located at (0, 0) and (1, 1), respectively,both having volume flux rate q (varying in differ-ent examples).

The actual initial concentration used to gener-ate the simulation data has a normal distributionpeaked at (0.375, 0.75) with standard deviation0.1. The direct problem is solved on a 128 × 128

8

u = 0 u = 0

v = 0

v = 0

qin

qout

0=∂∂

nc 0=

∂∂

nc

Figure 4: Schematic of Example 2.

finite element grid with a time step 0.02. Randommeasurement errors are simulated from a normaldistribution with zero mean and standard devia-tion 0.005 (5% - 15% of the maximum recordeddata in examples) in the homogeneous cases and0.002 in the heterogeneous case. The simulationconcentration data are generated by adding ran-dom measurement errors to the direct simulationsolution at sensor locations. The data are used torecover the contaminant concentration history ona 64× 64 grid. The number of sensors used variesin the following examples.

6.2.1 Case I: Diffusion-dominated trans-

port in homogeneous porous media

In the first two-dimensional example, adiffusion-dominated mode is considered by settinga very small value (q = 0.001) for the well fluxrate. The permeability and viscosity are bothtaken as 1.0. To ensure the molecular diffusionis the dominant transport mechanism, φ, αm, αl,αt take values 0.1, 0.1, 0.01 and 0.001, respec-tively. The concentration data are measured att = T = 1.0 in this case. The sensors are evenlydistributed on nodes of an 8 × 8 grid.

The true concentration profiles and cor-responding reconstructed concentration profiles(posterior mean estimates) at different time pointsare plotted in Fig. 5. The time indices are ob-tained by setting tcurrent = 1.0. Since the concen-tration data are only collected at sparse sites att = 1.0, it is of interest to reconstruct the entireconcentration field at this time. This is performedhere by solving the direct problem from t = 0 tot = 1.0 using the reconstructed concentration att = 0 as the initial condition. The same back-ward step size is used as in the direct simulation(∆t = 0.02). It is seen that the estimated concen-tration profiles are rather close to the true concen-tration. The peak concentration value in posteriormean estimate at t = 0 is 0.9311, indicating thatin this case, the backward marching procedure willbe continued even after the true initial releasingtime.

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 5: Reconstruction of the history of contami-nant concentration for diffusion-dominated transportin a homogeneous porous medium: (a) The true con-centrations at different past time steps; (b) the recon-structed concentrations.

9

0

0.5

1

0

0.5

1x

0

0.5

1

y

0

0.5

1

0

0.5

1x

0

0.5

1

y

(a) (b)

Figure 6: Reconstruction of the concentrationfor diffusion-dominated transport in a homogeneousporous medium at t = 0: (a) data are collected at9× 9 sensor locations at t = 0.2 (b) data are collectedat 5 × 5 sensor locations at t = 1.0.

It has also been observed in the study thatthe posterior mean estimates at time points closeto the measurement time are the most accurate.In Fig. 6(a), the concentration estimate at t = 0(the true releasing time) using data measured att = 0.2 is plotted. The peak value in this case is0.9916. The estimation of releasing time is veryaccurate in this case.

To test if the number of sensors can be furtherreduced, the above estimation is repeated to usedata at t = 1.0 from a 5 × 5 sensor network. Theposterior mean estimate of the concentration att = 0 (true releasing time) is plotted in Fig. 6(b).The peak value is only 0.8 in this case. There-fore, although the peak location and initial con-centration profile can be identified in this case, theestimation of releasing time is not acceptable.

6.2.2 Case II: Advection-dominated

transport in homogeneous porous

media

In the second numerical experiment, we recon-sider the earlier example by changing the follow-ing parameters: q = 0.04, αm = 0, αl = 0.04 andαt = 0.004. Convection and dispersion are themain mechanisms of contaminant propagation inthis case. Fig. 7 shows the true concentration pro-files and posterior mean estimates at different timepoints using data at t = T = 1.0. In this exam-ple, the data are measured using a 16× 16 sensornetwork.

In Fig. 8, the estimated profile was generatedusing the data collected from a 9 × 9 sensor net-work. It is seen that more fluctuations exist in theestimates. However, the peak location and profileof concentration can still be resolved.

6.2.3 Case III: Advection-dispersion in

heterogeneous porous media

In this example, we extend our earlier studiesto heterogeneous porous media. All the quantitiesremain the same as in Example 2 in Section 6.2.2

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 7: Reconstruction of the history of polluteconcentration for advection-dominated transport in ahomogeneous porous medium: (a) shows the true con-centrations at different past time steps and (b) showsthe reconstructions.

10

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0 t=0.2

t=0.4 t=0.6

t=0.8 t=1.0

Figure 8: Reconstruction of the contamination his-tory for advection-dominated transport in a homoge-neous porous medium when data are collected at 9×9sensor locations at t = 1.0.

except the permeability, which in this case is gen-erated randomly from a joint log-normal distribu-tion on a 32 × 32 finite lattice. The permeabilitymean at each site is 1.0 and the standard deviationof log permeability is 1.5. An uncorrelated perme-ability field is assumed in this case. The largestpermeability and smallest permeability values inthis example differ by the magnitude of 105.

Fig. 9 shows the true concentration profiles andposterior mean estimates at different time pointsusing data at t = T = 1.0. In this example, thedata are measured from a 32× 32 sensor network.The estimates obtained using data from a 16× 16sensor network are also presented in Fig. 10. It isobserved that the estimates using less sensor dataare comparable to the estimates in Fig. 9. Consid-ering the heterogeneity and uniformly distributedsensor network, estimates in Fig. 10 are quite im-pressive.

7 Conclusions and discussion

A hierarchical Bayesian computation method ispresented to solve the contaminant history recon-struction problem in porous media. The regularityof the solution to this inverse problem is enforced

by a pair-wise Markov random field model. Com-plete mathematical models are used in obtainingthe direct simulation results of contaminant prop-agation in porous media flow. The attributes ofthe method are demonstrated via numerical ex-amples in both homogeneous and heterogeneousporous media flows.

The current computational method success-fully estimates instantaneously released contam-ination source in mixed fluids flow with constantviscosity. When the mobility ratio of contami-nant to resident fluid (water) deviates largely fromunity, a more complicated model is required tosimulate the direct physical process. Furthermore,the estimation of continuously releasing contam-ination source is not addressed here and it is aproblem of current research interest.

Acknowledgement

The work presented here was was conducted us-ing the resources of the Cornell Theory Center,which receives funding from Cornell University,New York State, federal agencies, and corporatepartners.

References

[1] J. Atmadja, A. C. Bagtzoglou, Pollutionsource identification in heterogeneous porousmedia, Water Resources Research 37 (2001)2113-2125.

[2] A.M. Michalak and P.K. Kitanidis, Esti-mation of historical groundwater contam-inant distribution using the adjoint statemethod applied to geostatistical inverse mod-eling, Water Resources Research, 40 (2004),W08302.

[3] M. F. Snodgrass, P. K. Kitanidis, A geostatis-tical approach to contaminant source identi-fication, Water Resources Research 33 (1997)537-546.

[4] A. M. Michalak, P. K. Kitanidis, A methodfor enforcing parameter nonnegativity inBayesian inverse problems with an appli-caiton to contaminant source identification,Water Resources Research 39 (2003) 1033-1046.

[5] A. M. Michalak, P. K. Kitanidis, Applicationof geostatistical inverse modeling to conta-minant source identification at Dover AFB,Delaware, Journal of Hydraulic Research 42(2004) 9-18.

[6] C. Andrieu, N. D. Freitas, A. Doucet, M. I.Gordan, An introduction to MCMC for ma-

11

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0

t=0.2

t=0.4

t=0.6

t=0.8

t=1.0(a) (b)

Figure 9: Reconstruction of the history of polluteconcentration for advection-dispersion transport in aheterogeneous porous medium (data are collected at32 × 32 grid): (a) the true concentrations at differentpast time steps and (b) the computed reconstructions.

0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1.0

t=0 t=0.2

t=0.4 t=0.6

t=0.8 t=1.0

Figure 10: Reconstruction of the history of polluteconcentration for advection-dispersion transport in aheterogeneous porous medium (data are collected ona 16 × 16 grid).

chine learning, Machine Learning 50 (2003)5-43.

[7] J. Besag, P. Green, D. Higdon, K. Mengersen,Bayesian Computation and Stochastic Sys-tems, Statistical Science 10 (1995) 3-41.

[8] D. Higdon, H. Lee, Z. Bi, A Bayesian ap-proach to characterizing uncertainty in in-verse problems using coarse and fine-scaleinformation, IEEE Transactions on SignalProcessing 50 (2002) 389-399.

[9] J. Besag, P. J. Green, Spatial statistics andBayesian computation, Journal of the RoyalStatistical Society, Series B, Methodological55 (1993) 25-37.

[10] J. Wang, N. Zabaras, A Bayesian inferenceapproach to the stochastic inverse heat con-duction problem, International Journal ofHeat and Mass Transfer 47 (2004) 3927-3941.

[11] J. Wang, N. Zabaras, Hierarchical Bayesianmodels for inverse problems in heat conduc-tion, Inverse Problems 21 (2005) 183-206.

[12] P. Congdon, Bayesian Statistical Modelling,John Wiley & Sons, New York, 2001.

12

[13] S. Geman, D. Geman, Stochastic relaxation,Gibbs distributions and the Bayesian restora-tion of images, Transactions on PatternAnalysis and Machine Intelligence 6 (1984)721-741.

[14] W. R. Gilks (ed.), S. Richardson (ed.), D.J. Spiegelhalter (ed.), Markov Chain MonteCarlo in Practice, Chapman & Hall Ltd, NewYork, 1996.

[15] A. F.D. Loula, F. A. Rochinha, M. A. Mu-rad, Higher-order gradient post-processingfor second-order elliptic problems, ComputerMethods in Applied Mechanics and Engineee-ing 128 (1995) 361-381.

[16] J. P. Kaipio and E. Somersalo, Computa-tional and Statistical Methods for InverseProblems, Springer-Verlag, New York, 2005.

[17] H. Wang, D. Liang, R. E. Ewing, S. L. Lyons,G. Qin, An ELLAM-MFEM solution tech-nique for compressible fluid flows in porousmedia with point sources and sinks, Jour-nal of Computational Physics 159 (2000) 344-376.

[18] S. Balay, K. Buschelman, V. Eijkhout, W.D.Gropp, D. Kaushik, M.G. Knepley, L.C.McInnes, B.F. Smith, H. Zhang, PETSc

Users Manual, ANL-95/11 - Revision 2.1.5,Argonne National Laboratory, 2004.

[19] A. L.G.A. Coutinho, J. L.D. Alves, Parallelfinite element simulation of miscible displace-ment in porous media, SPE Journal 1 (1996)487-500.

[20] A.L.G.A. Coutinho, C.M. Dias, J.L.D. Alves,L. Laudau, A.F.D. Loula, S.M.C. Malta,R.G.S. Castro, E.L.M. Garcia, Stabilizedmethods and post-processing techniques formiscible displacements, Computer Methodsin Applied Mechanics and Engineering 193(2004) 1421-1436.

[21] R.G. Sanabria Castro, S.M.C. Malta, A.F.D.Loula, L. Landau, Numerical analysis ofspace-time finite element formulations formiscible displacements, Computational Geo-sciences 5 (2001) 301-330.

13

Documents

A Markov Random Field model of contamination source identi