14
Sensor Integration by Joint PDF Construction using the Exponential Family STEVEN KAY, Fellow, IEEE QUAN DING, Member, IEEE University of Rhode Island MURALIDHAR RANGASWAMY, Fellow, IEEE Air Force Research Laboratory We investigate the problem of sensor integration to combine all the available information in a multi-sensor setting from a statistical standpoint. Specifically, we propose a novel method of constructing the joint probability density function (pdf) of the measurements from all the sensors based on the exponential family and small signal assumption. The constructed pdf only requires knowledge of the joint pdf under a reference hypothesis and, hence, is useful in many practical cases. Examples and simulation results show that our method requires less information compared with existing methods but attains comparable detection/classification performance. Manuscript received January 13, 2011; revised July 10, 2011 and March 19, 2012; released for publication July 11, 2012. IEEE Log No. T-AES/49/1/944371. Refereeing of this contribution was handled by P. Willet. This work was supported by the Sensors Directorate of the Air Force Research Laboratory (AFRL/RYRT) under Contract FA8650-08-D-1303 to Dynetics, Inc. Authors’ addresses: S. Kay and Q. Ding, Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, 4 East Alumni Ave., Kingston, RI 02881, E-mail: ([email protected]); M. Rangaswamy, Sensors Directorate, Air Force Research Laboratory (AFRL/RYRT), WPAFB, OH 45433. 0018-9251/13/$26.00 c ° 2013 IEEE I. INTRODUCTION Distributed systems and information fusion have been widely studied and used in engineering, finance, and scientific research. Such applications are to radar, sonar, biomedical analysis, stock prediction, weather forecasting, and chemical, biological, radiological, and nuclear (CBRN) detection, to name a few. If the joint probability density function (pdf) under each candidate hypothesis is known, we would easily obtain the optimal performance by the Neyman-Pearson rule (using frequentist inference) for detection (binary hypothesis testing) or by the maximum a posteriori probability (MAP) rule (using Bayesian inference) for both detection and classification (multiple hypothesis testing) [1]. However, in practice, this information may not be available. This usually happens when the dimensionality of the sample space is high and when we do not have enough training samples to have an accurate estimate of the joint pdf. The problem is exacerbated by onerous environmental and systems constraints in radar and sonar applications. This is also recognized as the “curse of dimensionality” in pattern recognition and machine learning. Hence, it is important to efficiently approximate the unknown joint pdf using limited training data. One common approach is to assume that the measurements from different sensors are conditionally independent [2, 3]. This approach has been widely used due to its simplicity, since the joint pdf is then the product of the marginal pdfs. This is also known as the “product rule” in combining classifiers [4]. In spite of its popularity, the independence assumption may not be a good one if the measurements are actually correlated. Furthermore, as stated in [4], the product rule is severe because “it is sufficient for a single recognition engine to inhibit a particular interpretation by outputting a close-to-zero probability for it.” Hence, researchers have studied other methods that consider the correlation among the measurements. However, the problem does not have a unique solution when the data is non-Gaussian. A copula-based framework is proposed in [5], [6] to construct the joint pdf. The exponentially embedded families (EEFs) are used in [7] to estimate the joint pdf that is asymptotically closest to the true one in Kullback-Leibler (KL) divergence. Note that the above methods all require the knowledge of marginal pdfs. In this paper we consider the case when the marginal pdfs are not available or accurate, which can happen due to a high-dimensional sample space and insufficient training data. We present a new way of constructing the joint pdf without the knowledge of marginal pdfs but only a reference pdf. In our method this reference pdf is the pdf under the null hypothesis H 0 , and we assume that it is completely known. The constructed joint 580 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Sensor Integration by Joint PDF Construction using the Exponential Family

Embed Size (px)

Citation preview

Page 1: Sensor Integration by Joint PDF Construction using the Exponential Family

Sensor Integration by Joint

PDF Construction using the

Exponential Family

STEVEN KAY, Fellow, IEEE

QUAN DING, Member, IEEE

University of Rhode Island

MURALIDHAR RANGASWAMY, Fellow, IEEE

Air Force Research Laboratory

We investigate the problem of sensor integration to combine

all the available information in a multi-sensor setting from a

statistical standpoint. Specifically, we propose a novel method

of constructing the joint probability density function (pdf) of

the measurements from all the sensors based on the exponential

family and small signal assumption. The constructed pdf only

requires knowledge of the joint pdf under a reference hypothesis

and, hence, is useful in many practical cases. Examples and

simulation results show that our method requires less information

compared with existing methods but attains comparable

detection/classification performance.

Manuscript received January 13, 2011; revised July 10, 2011 and

March 19, 2012; released for publication July 11, 2012.

IEEE Log No. T-AES/49/1/944371.

Refereeing of this contribution was handled by P. Willet.

This work was supported by the Sensors Directorate of the

Air Force Research Laboratory (AFRL/RYRT) under Contract

FA8650-08-D-1303 to Dynetics, Inc.

Authors’ addresses: S. Kay and Q. Ding, Department of Electrical,

Computer, and Biomedical Engineering, University of Rhode

Island, 4 East Alumni Ave., Kingston, RI 02881, E-mail:

([email protected]); M. Rangaswamy, Sensors Directorate, Air Force

Research Laboratory (AFRL/RYRT), WPAFB, OH 45433.

0018-9251/13/$26.00 c° 2013 IEEE

I. INTRODUCTION

Distributed systems and information fusion have

been widely studied and used in engineering, finance,

and scientific research. Such applications are to

radar, sonar, biomedical analysis, stock prediction,

weather forecasting, and chemical, biological,

radiological, and nuclear (CBRN) detection, to

name a few. If the joint probability density function

(pdf) under each candidate hypothesis is known,

we would easily obtain the optimal performance

by the Neyman-Pearson rule (using frequentist

inference) for detection (binary hypothesis testing)

or by the maximum a posteriori probability (MAP)

rule (using Bayesian inference) for both detection

and classification (multiple hypothesis testing)

[1]. However, in practice, this information may

not be available. This usually happens when the

dimensionality of the sample space is high and when

we do not have enough training samples to have an

accurate estimate of the joint pdf. The problem is

exacerbated by onerous environmental and systems

constraints in radar and sonar applications. This is

also recognized as the “curse of dimensionality” in

pattern recognition and machine learning. Hence, it

is important to efficiently approximate the unknown

joint pdf using limited training data. One common

approach is to assume that the measurements from

different sensors are conditionally independent

[2, 3]. This approach has been widely used due to its

simplicity, since the joint pdf is then the product of

the marginal pdfs. This is also known as the “product

rule” in combining classifiers [4]. In spite of its

popularity, the independence assumption may not be a

good one if the measurements are actually correlated.

Furthermore, as stated in [4], the product rule is

severe because “it is sufficient for a single recognition

engine to inhibit a particular interpretation by

outputting a close-to-zero probability for it.” Hence,

researchers have studied other methods that consider

the correlation among the measurements. However,

the problem does not have a unique solution when

the data is non-Gaussian. A copula-based framework

is proposed in [5], [6] to construct the joint pdf. The

exponentially embedded families (EEFs) are used in

[7] to estimate the joint pdf that is asymptotically

closest to the true one in Kullback-Leibler (KL)

divergence.

Note that the above methods all require the

knowledge of marginal pdfs. In this paper we consider

the case when the marginal pdfs are not available or

accurate, which can happen due to a high-dimensional

sample space and insufficient training data. We

present a new way of constructing the joint pdf

without the knowledge of marginal pdfs but only a

reference pdf. In our method this reference pdf is the

pdf under the null hypothesis H0, and we assumethat it is completely known. The constructed joint

580 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 2: Sensor Integration by Joint PDF Construction using the Exponential Family

pdf takes the form of the exponential family and

incorporates all the available information. Based on

moment matching the parameters in the constructed

joint pdf are equivalent to the maximum likelihood

estimator (MLE) [8] of the unknown parameters of the

exponential family. Hence, they can be easily found

via convex optimization based on the properties of

the exponential family. Since there is no Gaussian

distribution assumption on the reference pdf, this

method can be very useful when the underlying

distributions are non-Gaussian. We start with the

detection problem, and then we extend our method

to the classification problem. For detection it is shown

that under some conditions, our detection statistics

are the same as the clairvoyant generalized likelihood

ratio test (GLRT). For classification our classifier

also has the same performance as the estimated

MAP classifier. Both the clairvoyant GLRT and the

estimated MAP classifier assume that the true pdfs

under each candidate hypothesis are known except for

the usual unknown parameters.

The paper is organized as follows. In Section II

we introduce a distributed detection/classification

problem. In Section III we construct the joint pdf by

an exponential family and apply it to the problem in

Section II. The KL divergence between the true pdf

and the constructed pdf is examined in Section IV,

and the result shows that within the exponential

family, the constructed pdf is asymptotically the

closest one to the true pdf. Examples for distributed

detection are given in Section V, and examples for

distributed classification are given in Section VI.

Simulation results to compare the performance of

our method with existing methods are shown in

Section VII. In Section VIII we draw conclusions.

II. PROBLEM STATEMENT

Consider the distributed detection/classification

problem when we observe the outputs of two sensors

T1(x) and T2(x), which are transformations of the

underlying samples x. The latter are unobservable at

the central processor as shown in Fig. 1. We choose

two sensors for simplicity. All the results in this

paper are valid for multiple sensors. For detection we

want to distinguish between two hypotheses H0 andH1 based on the outputs of the two sensors, and forclassification, we have M candidate hypotheses Hi fori= 1,2, : : : ,M.

Assume that we have sufficient training data

T(n)1 (x)s and T(n)2 (x)s under H0, i.e., when there is no

signal present. Hence, we have a good estimate of

the joint pdf of T1 and T2 under H0 [9], and thus,we assume pT1,T2 (t1,t2;H0) is completely known.Under H1 for detection or Hi for i= 1,2, : : : ,Mfor classification when a signal is present, we may

not even have enough training data to estimate the

marginal pdfs at each sensor, let alone the joint pdf.

Fig. 1. Distributed detection/classification system with two

sensors.

This is especially the case in the radar scenario, where

the target is present for only a small portion of the

time. So our goal is to use the available information

to construct an appropriate pT1,T2 (t1,t2;H1) under H1for detection or pT1,T2 (t1,t2;Hi) under each Hi forclassification. A simple illustration is shown in Fig. 1.

III. JOINT PDF CONSTRUCTION BY EXPONENTIALFAMILY AND ITS APPLICATION IN DISTRIBUTEDSYSTEMS

First, we consider the detection problem, where we

wish to construct pT1,T2 (t1,t2;H1). The result is thenextended to the classification problem.

To simplify the notation let

T=

·T1

T2

¸so that we can write the joint pdf pT1,T2 (t1,t2;Hi) aspT(t;Hi) for i= 0,1.Since pT(t;H1) cannot be uniquely specified based

on pT(t;H0), we assume that 1) pT(t;H1) is closeto pT(t;H0) (small signal assumption), and 2) theexpected value of T under H1 or E1(T) is known.The reason that we assume small signal is because,

in practice, we are really interested in the small

signal case since for large signals, even a nonoptimal

detector would have acceptable performance. Now, we

want to find a pdf pT(t) such that the KL divergence

D(pT(t)kpT(t;H0)) =ZpT(t) ln

pT(t)

pT(t;H0)dt (1)

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 581

Page 3: Sensor Integration by Joint PDF Construction using the Exponential Family

is minimized with the constraint that

Ep(T) = E1(T): (2)

Then the pdf pT(t) is used as our constructed pdf of Tunder H1. Note that the constraint in (2) is consideredas moment matching since the constructed pdf has the

same moment as the true pdf. Here, we consider the

KL divergence because by Stein’s lemma [10], the KL

divergence determines the asymptotic performance

for detection. An extended result to classification

has been recently presented in [11]. Therefore, this

is a worst case approach obtained by minimizing

D(pT(t)kpT(t;H0)).The Kullback theorem in [12] shows that the

solution of the above problem is

pT(t) = exp[μTt¡K(μ) + lnpT(t;H0)] (3)

where

K(μ) = lnE0[exp(μTT)] = ln

Zt

exp(μTT)pT(t;H0)dt

(4)

is the cumulant generating function of pT(t;H0), and itnormalizes the pdf to integrate to one. Here, μ are notfree parameters, and it has to satisfy the constraint in

(2). It can be easily shown that

@K(μ)

@μ= Ep(T) (5)

where

@K(μ)

@μ=

"@K(μ)

@μ1

@K(μ)

@μ2: : :@K(μ)

@μp

#Twith p being the length of μ. Therefore, the constraintin (2) is equivalent to

@K(μ)

@μ= E1(T): (6)

Note that in practice, we may not know E1(T), andtherefore, we use t as an estimate of E1(T). Finally,the constructed pdf under H1 is given by (3), where μsatisfies

@K(μ)

@μ= t: (7)

Now suppose we have an exponential family

parameterized by μ as

pT(t;μ) = exp[μTt¡K(μ) + lnpT(t;H0)] (8)

which is the same as in (3) except that μ are freeparameters. It is also shown in [13] that families of

pdfs with small statistical curvature enjoy, nearly,

the good statistical properties of the EEF in (8).

Since K(μ) is convex by Holder’s inequality [14], theMLE of μ can be obtained by taking the derivativeof μTt¡K(μ) with respect to μ and setting it to zero.This results in

t¡ @K(μ)@μ

= 0 (9)

which is identical to (7). Assume that K(μ) is strictlyconvex and that the solution is unique. Therefore,

this shows that the MLE produces the same μ asmoment matching does in (7). Hence, we can write

the constructed pdf pT(t) in (3) as

pT(t) = pT(t; μ) = exp[μTt¡K(μ) + lnpT(t;H0)]

(10)

where μ is the MLE of μ. Also, note that sinceK(μ) is convex, finding the MLE becomes a convexoptimization problem, and many existing methods can

be readily utilized [15, 16]. If μ = 0, then the pdf in(8) becomes pT(t;H0) or

pT(t;0) = pT(t;H0): (11)

This shows that pT(t;H0) also belongs to theexponential family in (8).

This constructed pdf in (10) looks similar to the

Edgeworth expansion (see [17, eqs. (2.1) and (2.17)]).

The Edgeworth expansion is an approximation of the

cumulative distribution function (CDF) of the sum of

independent and identically distributed (IID) samples

starting from the CDF of the standard Gaussian

distribution. Note that the pdf in (8) belongs to the

exponential family. Since T is a sufficient statistic for

the exponential pdf in (8), this pdf incorporates all

the information from the two sensors. Since pT(t;H0)is required to construct pT(t; μ), it is assumed thatpT(t;H0) is available or that it can be estimated withreasonable accuracy. Also, note that if T1, T2 arestatistically dependent under H0, they will also bedependent for the constructed pdf under H1.With the small signal assumption, pT(t;H1) is

close to pT(t;H0). Under this constraint it has beenshown in [18] that if t is the score function, i.e.,

t=@ lnpT(t;μ)

¯μ=0

(12)

and E0(T) = 0, by using a first-order Taylor expansionon the log-likelihood function lnpT(t;μ) aboutμ = 0, we obtain the same pdf as in (8). This smallsignal analysis is similar to the locally optimum

detector (LOD) [19], but there are some fundamental

differences between our method and the LOD.

1) Our method produces a pdf, not only a test

statistic.

2) t need not be a score function for theconstructed pdf to be a valid pdf. If it were, the

constructed pdf in (8) could be interpreted as a

first-order Taylor expansion and normalized to be a

pdf.

Finally, since (10) is the constructed pdf under H1,we decide H1 if

lnpT(t; μ)

pT(t;H0)= μTt¡K(μ)> ¿ (13)

582 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 4: Sensor Integration by Joint PDF Construction using the Exponential Family

where ¿ is a threshold. Note that Kullback also had

similar ideas (see [12, ch. 5]) where μTt¡K(μ) can beconsidered as the estimated KL divergence between

pT(t;μ) and pT(t;H0)).This method can be extended to the classification

problem. Similar to (10), as shown in [20], we can

construct the pdf of T under Hi as

pT(t;μi) = pT(t; μi) = exp[μTi t¡K(μi) + lnpT(t;H0)]

(14)where

K(μi) = lnE0[exp(μTi T)] (15)

is the cumulant generating function of pT(t;H0) thatnormalizes the constructed pdf. pT(t; μi) is considered

as our estimate of pT(t j Hi), where μi is the MLE ofμi. Hence, similar to the MAP rule [1], we decide Hi,for which the following is maximum over i

p(Hi j t)pT(t) = pT(t j Hi)p(Hi) = pT(t; μi)p(Hi):(16)

When we assume that the prior probabilities of each

candidate hypothesis are equal, i.e., p(H1) = ¢ ¢ ¢=p(HM) = 1=M, p(Hi) cancels, and we can equivalentlydecide Hi, for which the following is maximum over i

lnpT(t; μi)

pT(t;H0)= μTi t¡K(μi): (17)

IV. KL DIVERGENCE BETWEEN THE TRUE PDF ANDTHE CONSTRUCTED PDF

The KL divergence is a nonsymmetric measure of

difference between two pdfs. For two pdfs p1 and p0,

it is defined as

D(p1kp0) =Zp1(x) ln

p1(x)

p0(x)dx:

It is well known that D(p1kp0)¸ 0 with equalityif and only if p1 = p0 almost everywhere [12]. As

we mentioned in Section III, the KL divergence

determines the asymptotic performance for both

detection and classification. Next we show that under

H0, pT(t; μ) = pT(t;H0) asymptotically, and similarlyunder H1, within the family of pdfs in (8), pT(t; μ) isasymptotically the closest one to the true pdf in KL

divergence. Similar results and arguments have been

shown in [7], [21].

Assume that we observe IID T(n)s with

T(n) =

"T(n)1

T(n)2

#for n= 1,2, : : : ,L. Shortening the notation we write

pT(1),T(2),:::,T(L) (t(1),t(2), : : : , t(L); μ) as p(t(1), t(2), : : : ,t(L); μ).

The constructed pdf can be easily extended to

(see (10))

p(t(1), t(2), : : : , t(L); μ)

= exp

"μT

LXn=1

t(n)¡LK(μ) + lnp(t(1), t(2), : : : , t(L);H0)

#(18)

where the MLE μ is obtained by solving

1

L

LXn=1

t(n) =@K(μ)

@μ: (19)

Now we consider two cases. First, for the true pdf

under H0, by the law of large numbers, it follows that

1

L

LXn=1

t(n)!PE0(T)

as L!1. Note that@K(μ)

¯μ=0

= E0(T):

Since the solution of (19) is unique, asymptotically we

have

μ = 0

and hence, p(t(1),t(2), : : : , t(L); μ) =p(t(1),t(2), : : : , t(L);H0).Second, for the true pdf under H1, by the law of

large numbers, it follows that

1

L

LXn=1

t(n)!PE1(T)

as L!1. Therefore, the MLE μ asymptoticallymaximizes

μTE1(T)¡K(μ): (20)

We denote the underlying true pdf under H1as p(t(1),t(2), : : : ,t(L);H1) and the pdf within theexponential family in (8) as p(t(1),t(2), : : : ,t(L);μ).Since, from (18)

lnp(t(1), t(2), : : : , t(L);H1)

p(t(1), t(2), : : : , t(L);μ)

=¡ÃμT

LXn=1

t(n)¡LK(μ)!+ ln

p(t(1), t(2), : : : , t(L);H1)

p(t(1), t(2), : : : , t(L);H0)

the KL divergence between the true pdf and the one in

the exponential family is

D(p(t(1), t(2), : : : , t(L);H1)kp(t(1), t(2), : : : , t(L);μ))

= E1

"¡ÃμT

LXn=1

t(n)¡LK(μ)!+ ln

p(t(1), t(2), : : : , t(L);H1)

p(t(1), t(2), : : : , t(L);H0)

#=¡L[μTE1(T)¡K(μ)]

+D(p(t(1), t(2), : : : , t(L);H1)kp(t(1), t(2), : : : , t(L);H0)): (21)

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 583

Page 5: Sensor Integration by Joint PDF Construction using the Exponential Family

Since D(p(t(1),t(2), : : : ,t(L);H1)kp(t(1),t(2), : : : , t(L);H0))is fixed, D(p(t(1), t(2), : : : ,t(L);H1)kp(t(1),t(2), : : : ,t(L);μ))is minimized by maximizing (20). This shows that

given the exponential family in (8), p(t(1),t(2), : : : ,

t(L); μ) is asymptotically the closest to p(t(1), t(2), : : : ,t(L);H1) in KL divergence.V. EXAMPLES-DISTRIBUTED DETECTION

In this section we compare our method with the

clairvoyant GLRT for a specific detection problem.

Since the true pdf does not necessarily belong to the

exponential family in (8), for the clairvoyant GLRT,

assume that we know that the true pdf of T under H1belongs to a family of pdfs parameterized by some

unknown parameters ®. Note that the μ parametersin our method are in the constructed exponential pdf,

while ® are parameters in the true pdf. Therefore, theclairvoyant GLRT provides an upper bound on GLRT

performance, and it decides H1 if

lnpT(t;®)

pT(t;H0)> ¿: (22)

A. Partially Observed Linear Model with GaussianNoise

Suppose we have the linear model with

x=H®+w (23)with

H0 : ®= 0

H1 : ® 6= 0where x is an N £ 1 vector of the underlyingunobservable samples, H is an N £p observationmatrix with full column rank, ® is a p£ 1 vector ofthe unknown signal amplitudes, and w is an N £ 1vector of white Gaussian noise samples with known

variance ¾2. We observe two sensor outputs

T1(x) =HT1x

T2(x) =HT2x

(24)

where H1 is N £ q1 and H2 is N £ q2. Note that[H1 H2] does not have to be H. This model is calledthe partially observed linear model.

Let G= [H1 H2]. We assume that G has full

column rank so that there is no perfectly redundant

measurements of the sensors. Then we have

T=

·T1(x)

T2(x)

¸=

·HT1x

HT2x

¸=GTx: (25)

Thus, T is also Gaussian, and

T»N (0,¾2GTG) under H0:Let q= q1 + q2, and we can see that T is q£1. As aresult we construct the pdf as in (8), with

K(μ) = lnE0[exp(μTT)] = 1

2¾2μTGTGμ: (26)

Hence, the constructed pdf is

pT(t;μ) = exp[μTt¡K(μ) + lnpT(t;H0)]

=1

(2¼¾2)q=2 det1=2(GTG)exp

μ¡ t

T(GTG)¡1t2¾2

¶¢ exp[μTt¡ 1

2¾2μTGTGμ] (27)

which can be simplified to

T»N (¾2GTGμ,¾2GTG) under H1: (28)

Note that μ is the vector of the unknown parametersin the constructed pdf and that it is different from the

truly unknown parameters ®. From (7) and (26) the

MLE of μ satisfies

t=@K(μ)

@μ= ¾2GTGμ:

So

μ =1

¾2(GTG)¡1t

and the test statistic becomes

μTt¡K(μ) = 1

2¾2tT(GTG)¡1t: (29)

Next we consider the clairvoyant GLRT. It is

defined as the GLRT when the true pdf of T under

H1 is known except for the underlying unknownparameters ®. It is considered as the suboptimaltest by plugging the MLE of ® into the true pdfparameterized by ®. Since the constructed pdf maynot be the true pdf, the clairvoyant GLRT requires

more information than our method. From (25) we

know that the true pdf is

T»N (GTH®,¾2GTG) under H1: (30)

Note that (28) is the constructed pdf, while (30) is the

true pdf. For small signal they are equivalent since

their means are close to zero and since they have the

same covariance matrix. This verifies that our method

works for the small signal case. We need to estimate

μ in (28) or ® in (30) to implement the pdf. We writethe true pdf under H1 as pT(t;®). The MLE of ® isfound by maximizing the true pdf given by (30)

lnpT(t;®)

pT(t;0)=¡ 1

2¾2(t¡GTH®)T(GTG)¡1(t¡GTH®)

+1

2¾2tT(GTG)¡1t:

If q· p, i.e., the length of t is less than or equal tothe length of ®, then the MLE ® may not be unique.However, since (t¡GTH®)T(GTG)¡1(t¡GTH®)¸ 0,we can always find ® such that t=GTH®, and hence,(t¡GTH®)T(GTG)¡1(t¡GTH®) = 0. Hence, theclairvoyant GLRT statistic becomes

lnpT(t;®)

pT(t;H0)=

1

2¾2tT(GTG)¡1t (31)

584 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 6: Sensor Integration by Joint PDF Construction using the Exponential Family

which is the same as our test statistic (see (29)) when

q· p.If q > p it can be shown that

®= (HTG(GTG)¡1GTH)¡1HTG(GTG)¡1t

and the clairvoyant GLRT statistic becomes

lnpT(t;®)

pT(t;H0)

=tT(GTG)¡1GTH(HTG(GTG)¡1GTH)¡1HTG(GTG)¡1t

2¾2

(32)

while the constructed GLRT statistic is shown in (29).

B. Partially Observed Linear Model with GaussianMixture Noise

The partially observed linear model remains the

same as in the previous subsection except, instead of

assuming that w is white Gaussian noise, we assume

that w has a Gaussian mixture distribution with two

components, i.e.

w» ¼N (0,¾21I)+ (1¡¼)N (0,¾22I) (33)

where ¼, ¾21, and ¾22 are known (0< ¼ < 1). The

following derivation can be easily extended when

w»PLi=1¼iN (0,¾2i I).

Since w has a Gaussian mixture distribution,

T=GTx is also Gaussian mixture distributed, and

T» ¼N (0,¾21GTG) + (1¡¼)N (0,¾22GTG) under H0:So we have

K(μ) = lnE0[exp(μTt)]

= ln(¼e(1=2)¾21μTGTGμ+(1¡¼)e(1=2)¾22μTGTGμ):

(34)Hence the constructed pdf is

pT(t;μ) = exp[μTt¡K(μ) + lnpT(t;H0)]

=

·¼

(2¼¾21)q=2 det1=2(GTG)

exp

μ¡ t

T(GTG)¡1t2¾21

¶+

1¡¼(2¼¾22)

q=2 det1=2(GTG)exp

μ¡ t

T(GTG)¡1t2¾22

¶¸¢ exp(μTt)=(¼e(1=2)¾21μTGTGμ+(1¡¼)e(1=2)¾22μTGTGμ):

(35)

Although this constructed pdf cannot be further

simplified, we can still find the MLE by solving

t=@K(μ)

=¼e(1=2)¾

21μTGTGμ ¢¾21GTGμ+(1¡¼)e(1=2)¾

22μTGTGμ ¢¾22GTGμ

¼e(1=2)¾21μTGTGμ+(1¡¼)e(1=2)¾22μTGTGμ

:

(36)

Our test statistic is just

μTt¡K(μ)

= μTt¡ ln(¼e(1=2)¾21 μTGTGμ+(1¡¼)e(1=2)¾22 μTGTGμ)(37)

where μ satisfies (36). Although no analytical solution

of the MLE of μ exists, it can be found using convex

optimization techniques [15, 16]. Moreover, an

analytical solution exists when kμk! 0. To see this

we show that

limkμk!0

@K(μ)

@μ:=(¼¾21G

TGμ+(1¡¼)¾22GTGμ) = 1

(38)

where @K(μ)=@μ is shown in (36) and := means

element-by-element division.

To prove (38) we have

limkμk!0

(¼e(1=2)¾21μTGTGμ+(1¡¼)e(1=2)¾22μTGTGμ) = 1

(39)and

limkμk!0

(¼e(1=2)¾21μTGTGμ ¢¾21GTGμ

+(1¡¼)e(1=2)¾22μTGTGμ ¢¾22GTGμ):=¢ (¼¾21GTGμ+(1¡¼)¾22GTGμ)

= 1 (40)

by L’Hospital’s rule. Dividing (40) by (39) and from

(36), (38) is proved. As a result of (36) and (38), the

MLE of μ satisfies

t= ¼¾21GTGμ+(1¡¼)¾22GTGμ

as kμk! 0 and μ can be easily found as

μ =1

¼¾21 + (1¡¼)¾22(GTG)¡1t: (41)

Since

limkμk!0

K(μ)=( 12¼¾21μ

TGTGμ+ 12(1¡¼)¾22μTGTGμ)

= 1

by using L’Hospital’s rule twice, as kμk! 0, our test

statistic becomes (see (37))

μTt¡ ( 12¼¾21 μ

TGTGμ+ 12(1¡¼)¾22 μTGTGμ)

=1

2(¼¾21 + (1¡¼)¾22)tT(GTG)¡1t: (42)

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 585

Page 7: Sensor Integration by Joint PDF Construction using the Exponential Family

TABLE I

Comparison of our Test Statistic and the Clairvoyant GLRT

Our Method Clairvoyant GLRT (q· p) Clairvoyant GLRT (q > p)

Gaussian

Noise

tT(GTG)¡1t tT(GTG)¡1t tT(GTG)¡1GTH(HTG(GTG)¡1GTH)¡1HTG(GTG)¡1t

Uncorrelated

Non-Gaussian

Noise

maxμ[μTt¡ ln(¼e(1=2)¾

21μTGTGμ

+(1¡¼)e(1=2)¾22μTGTGμ

)]

tT(GTG)¡1t ¼

(¾21)q=2

exp

·¡ 12(t¡GTH®)T (G

TG)¡1

¾21

(t¡GTH®)¸

+1¡¼(¾22)q=2

exp

·¡ 12(t¡GTH®)T (G

TG)¡1

¾22

(t¡GTH®)¸

Correlated

Non-Gaussian

Noise

maxμ[μTt¡ ln(¼e(1=2)μTGTC1Gμ+(1¡¼)e(1=2)μTGTC2Gμ)]

¡ lnμ

¼

det1=2(C1)exp

h¡ 12tT(GTC

1G)¡1t

i+

1¡¼det1=2(C

2)exp

h¡ 12tT(GTC

2G)¡1t

i¶ max®

·¼

det1=2(GTC1G)exp

h¡ 12(t¡GTH®)T(GTC

1G)¡1(t¡GTH®)

i+

1¡¼det1=2(GTC

2G)exp

h¡ 12(t¡GTH®)T(GTC

2G)¡1(t¡GTH®)

To find the clairvoyant GLRT statistic, we know

that under H1 the true pdf ispT(t;®) =

¼

(2¼)q=2 det1=2(¾21GTG)

¢ exp·¡12(t¡GTH®)T (G

TG)¡1

¾21(t¡GTH®)

¸+

1¡¼(2¼)q=2 det1=2(¾22G

TG)

¢ exp·¡12(t¡GTH®)T (G

TG)¡1

¾22(t¡GTH®)

¸:

(43)

Note the difference between (35) and (43) since (35)

is the constructed pdf and (43) is the true pdf. The

MLE of ® is found by maximizing (43) over ®.When q· p the MLE of ® may not be unique but

satisfies t=GTH®. As a result pT(t;®) is a constant,and the clairvoyant GLRT statistic becomes

¡ lnpT(t;0):Note that since pT(t;0) is decreasing as t

T(GTG)¡1tincreases, the clairvoyant GLRT statistic becomes

tT(GTG)¡1t (44)

which is the same as our test statistic (with only a

positive scale factor) as kμk! 0 (see (29)). However,

for large signal, our test statistic in (37) is not

equivalent to the clairvoyant GLRT statistic in (44).

This example shows that our method may not offer

the suboptimal performance as the clairvoyant GLRT

does for large signal.

When q > p it can be shown that

®= (HTG(GTG)¡1GTH)¡1HTG(GTG)¡1t

and the clairvoyant GLRT statistic becomes

¼

(¾21)q=2exp

·¡12(t¡GTH®)T (G

TG)¡1

¾21(t¡GTH®)

¸+1¡¼(¾22)

q=2exp

·¡12(t¡GTH®)T (G

TG)¡1

¾22(t¡GTH®)

¸:

(45)

Note that the noise in (33) is uncorrelated but not

independent. We next consider a general case when

the noise can be correlated with a Gaussian mixture

pdf

w» ¼N (0,C1)+ (1¡¼)N (0,C2): (46)

It can be shown that similar to (37), our test statistic is

μTt¡ ln(¼e(1=2)μTGTC1Gμ+(1¡¼)e(1=2)μTGTC2Gμ)(47)

and the clairvoyant GLRT statistic is

¡ lnÃ

¼

det1=2(C1)exp

·¡12tT(GTC1G)

¡1t¸

+1¡¼

det1=2(C2)exp

·¡12tT(GTC2G)

¡1t¸!

(48)when q· p.When q > p the MLE of ® is not in closed form,

and hence we write the clairvoyant GLRT statistic as

max®

det1=2(GTC1G)exp

·¡12(t¡GTH®)T(GTC1G)¡1(t¡GTH®)

¸

+1¡¼

det1=2(GTC2G)exp

·¡12(t¡GTH®)T(GTC2G)¡1(t¡GTH®)

¸#: (49)

586 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 8: Sensor Integration by Joint PDF Construction using the Exponential Family

C. Summary

We have considered the partially observed linear

model with both Gaussian and non-Gaussian noise.

Table I compares our test statistic with the clairvoyant

GLRT.

1) In Gaussian noise w»N (0,¾2I). The teststatistics are exactly the same for q· p.2) In uncorrelated non-Gaussian noise, w»

¼N (0,¾21I) + (1¡¼)N (0,¾22I). The test statistics arethe same as μ! 0 for q· p.3) In correlated non-Gaussian noise, w» ¼N (0,C1)

+(1¡¼)N (0,C2). Although we cannot show theequivalence between these two test statistics, we see

in Section VII that their performances appear to be the

same.

VI. EXAMPLES-DISTRIBUTED CLASSIFICATION

In this section we compare our method with the

estimated MAP classifier for some classification

problems. The estimated MAP classifier assumes

that the pdf of T under Hi is known except for someunknown underlying parameters ®i. We assumeequal prior probability of the candidate hypothesis,

i.e., p(H1) = ¢ ¢ ¢= p(HM) = 1=M. So, the estimatedMAP classifier reduces to the estimated maximum

likelihood classifier [1], which finds the MLE of ®iand chooses Hi for which the following is maximumover i

pT(t;®i) (50)

where ®i is the MLE of ®i.

A. Linear Model with Known Variance

Consider the following classification model:

Hi : x= Aisi+w (51)

where si is an N £ 1 known signal vector with thesame length as x and depends upon the class, Ai is theunknown signal amplitude, and w is white Gaussian

noise with known variance ¾2. Assume that instead of

observing x, we can only observe the measurementsof two sensors

T1 =HT1x

T2 =HT2x

(52)

where H1 is N £ q1 and H2 is N £ q2. Here q1 and q2are the length for vectors T1 and T2, respectively. We

can write (52) asT=GTx (53)

by letting

T=

·T1

T2

¸and

G= [H1 H2]

where G is N £ (q1 + q2) with q1 + q2 ·N. Weassume that G has full column rank so that there are

no perfectly redundant measurements of the sensors.

Note that G can be any matrix with full column rank.

Let H0 be the reference hypothesis when there isnoise only, i.e.,

H0 : x=w: (54)

Since x is Gaussian under H0, according to (53), T isalso Gaussian, and

T»N (0,¾2GTG)under H0. We construct the pdf under Hi as in (8)with

K(μi) = lnE0[exp(μTi T)] =

12¾2μTi G

TGμi: (55)

Hence the constructed pdf is

pT(t;μi) = exp[μTi t¡K(μi) + lnpT(t;H0)]

=1

(2¼¾2)(q1+q2)=2 det1=2(GTG)

¢ expμ¡ t

T(GTG)¡1t2¾2

¶¢ exp[μTi t¡ 1

2¾2μTi G

TGμi] (56)

which can be simplified as

T»N (¾2GTGμi,¾2GTG) under Hi: (57)

The next step is to find the MLE of μi. Note thatthe MLE of μi is found by maximizing μ

Ti t¡K(μi)

over μi. If this optimization procedure is carried out

without any constraints, then μi would be the samefor all i. Hence we need some implicit constraints in

finding the MLE. Since μi represents the signal underHi, we should have

μi = AiGTsi = EHi (T) (58)

which is the mean of T under Hi. As a result (57) canbe written as

T»N (¾2AiGTGGTsi,¾2GTG) under Hi: (59)

Thus, instead of finding the MLE of μi by maximizing

μTi t¡K(μi) = μTi t¡ 12¾2μTi G

TGμi (60)

with the constraint in (58), we can find the MLE of Aiin (59) (since si is assumed known) and then plug itinto (58). It can be shown that

Ai =sTi Gt

¾2sTi GGTGGTsi

(61)

and

μi =GTsis

Ti Gt

¾2sTi GGTGGTsi

: (62)

Hence by removing the constant factors, the test

statistic of our classifier for Hi is(sTi Gt)

2

(GTsi)TGTG(GTsi)

(63)

according to (17).

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 587

Page 9: Sensor Integration by Joint PDF Construction using the Exponential Family

TABLE II

Comparison of our Test Statistic and the Estimated MAP Classifier

Our Method Estimated MAP

Known ¾2 (sTiGt)2

(GTsi)TGTG(GTsi)

(sTiG(GTG)¡1t)2

(GTsi)T(GTG)¡1(GTsi)

Unknown ¾2 tTC¡1hi(hTiC¡1hi)

¡1hTiC¡1t

tT[C¡1¡C¡1hi(hTi C¡1h)¡1i hTi C¡1]ttTC¡1gi(g

TiC¡1gi)

¡1hTC¡1ttT[C¡1¡C¡1gi(gTi C¡1gi)¡1gTi C¡1]t

Where hi =GTGGTsi, gi =G

Tsi and C=GTG.

Next we consider the estimate MAP classifier. In

this case we assume that we know the true pdf except

for Ai

T»N (AiGTsi,¾2GTG) under Hi: (64)

Note that (64) is the true pdf of T under Hi and that(59) is the constructed pdf. It can be shown that the

MLE of Ai in the true pdf under Hi is

Ai =sTi G(G

TG)¡1tsTi G(G

TG)¡1GTsi: (65)

By removing the constant terms, the test statistic of

the estimated MAP classifier for Hi is(sTi G(G

TG)¡1t)2

(GTsi)T(GTG)¡1(GTsi)

(66)

according to (50). Note that (61) and (65) are different

because (61) is the MLE of Ai under the constructed

pdf and because (65) is the MLE of Ai under the true

pdf. Also note that if GTG is a scaled identity matrix,

the test statistics in (63) and (66) are equivalent, and

hence, our method coincides with the estimated MAP

classifier.

B. Linear Model with Unknown Variance

To extend the above example, we consider the

above linear model with unknown noise variance ¾2.

As we show in (59), the constructed pdf is still

T»N (¾2AiGTGGTsi,¾2GTG) under Hi (67)

except that ¾2 is unknown. Letting Bi = ¾2Ai, we have

T»N (BiGTGGTsi,¾2GTG) under Hi: (68)

Instead of finding the MLEs of Ai and ¾2, we can

equivalently find the MLEs of Bi and ¾2. Let hi =

GTGGTsi, and let C=GTG. It can be shown that

Bi = (hTi C

¡1hi)¡1hTi C

¡1t (69)

and

¾2 =1

p1 +p2(t¡hiBi)TC¡1(t¡hiBi): (70)

By removing the constant factors, it can also be

shown that the test statistic is equivalent to

tTC¡1hi(hTi C

¡1hi)¡1hTi C

¡1ttT[C¡1¡C¡1hi(hTi C¡1h)¡1i hTi C¡1]t

: (71)

Next we consider the estimated MAP classifier.

The true pdf is still

T»N (AiGTsi,¾2GTG) under Hi (72)

but with unknown Ai and ¾2. Let gi =G

Tsi and let

C=GTG. Similar to (69), (70), and (71), it can be

shown that

Ai = (gTi C

¡1gi)¡1gTi C

¡1t (73)

¾2 =1

p1 +p2(t¡ giAi)TC¡1(t¡ giAi) (74)

and that the test statistic of the estimated MAP

classifier is

tTC¡1gi(gTi C

¡1gi)¡1hTC¡1t

tT[C¡1¡C¡1gi(gTi C¡1gi)¡1gTi C¡1]t: (75)

Note that if GTG is a scaled identity matrix, since

hi =GTGgi, the test statistics in (71) and (75) are

equivalent. Hence our method is exactly the same

as the estimated MAP classifier if GTG is a scaled

identity matrix.

C. Summary

We have considered a linear model with both

known and unknown noise variance. Table II

compares our test statistic with the estimated MAP

classifier. If GTG is a scaled identity matrix, then our

method and the estimated MAP classifier are identical.

Note that this is the case when all the columns in

G are orthogonal and have the same power, such as

the demodulation of the M-ary orthogonal signals in

communication theory.

VII. SIMULATIONS

A. Distributed Detection

Since our test statistic coincides with the

clairvoyant GLRT under Gaussian noise for q· pas shown in Section V-A, we only compare the

performances under non-Gaussian noise (both

uncorrelated noise as in (33) and correlated noise as

in (46)). Consider the model where

x[n] = A1 +A2rn+A3 cos(2¼f0n+Á) +w[n]

(76)

588 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 10: Sensor Integration by Joint PDF Construction using the Exponential Family

Fig. 2. ROC curves for our method and clairvoyant GLRT with uncorrelated Gaussian mixture noise. (a) False alarm rate between 0

and 1. (b) False alarm rate between 0.0001 and 0.01.

for n= 0,1, : : : ,N ¡ 1, with known damping factorr 2 (0,1) and frequency f0 but unknown amplitudesA1, A2, A3, and phase Á. This is a linear model as in

(23), where

H=

2666641 1 1 0

1 r cos(2¼f0) sin(2¼f0)

......

......

1 rN¡1 cos(2¼f0(N ¡ 1)) sin(2¼f0(N ¡ 1))

377775and ®= [A1 A2 A3 cosÁ ¡A3 sinÁ]T.Let w have an uncorrelated Gaussian mixture

distribution as in (33). For the partially observed

linear model, we observe two sensor outputs as

in (24). We compare the GLRT in (37) with the

clairvoyant GLRT in (44). Note that the MLE

of μ in (37) is found numerically and not by the

asymptotic approximation in (41). In the simulation

we use N = 20, A1 = 2, A2 = 3, A3 = 4, Á= ¼=4,

r = 0:95, f0 = 0:34, ¼ = 0:9, ¾21 = 50, ¾

22 = 500,

and H1 and H2 are the first and third columns in

H, respectively, i.e., H1 = [1 1 : : : 1]T and H2 =

[1 cos(2¼f0) : : :cos(2¼f0(N ¡ 1))]T. Hence, only thedc level is sensed by one sensor, and the in-phase

component of the sinusoid is sensed by the other

sensor. As shown in Fig. 2, the performances are

almost the same (even for very small false alarm rate),

which justifies their equivalence when kμk! 0, as

shown in Section V.

Next, for the same model as in (76), let w have a

correlated Gaussian mixture distribution as in (46).

We compare performances of the GLRT using the

constructed pdf as in (47) and the clairvoyant GLRT

as in (48). We use N = 20, A1 = 3, A2 = 4, A3 = 3,

Á= ¼=7, r = 0:9, f0 = 0:46, ¼ = 0:7, H1 = [1,1, : : : ,1]T,

and H2 = [1,cos(2¼f0), : : : ,cos(2¼f0(N ¡ 1))]T. The

covariance matrices C1, C2 are generated using

C1 =RT1R1, C2 =R

T2R2, where R1, R2 are full rank

N £N matrices. As shown in Fig. 3, the performances

are still very similar, even for small false alarm rate.

B. Distributed Classification

For the model in (51)

Hi : x= Aisi+w

we first consider a case when GTG is approximately

a scaled identity matrix. Let A1 = 0:4, A2 = 1:2,

A3 = 0:9, and

s1(n) = cos(2¼f1n)

s2(n) = cos(2¼f2n)

s3(n) = cos(2¼f3n)

where n= 0,1, : : : ,N ¡ 1 with N = 25, and f1 = 0:14,f2 = 0:34, and f3 = 0:41. Let p(H1) = p(H2) = p(H3) =1=3. Assume that there are two sensors, each with an

observation matrix as follows, respectively

H1 =

"1 cos(2¼f1) ¢ ¢ ¢ cos(2¼f1(N ¡ 1))1 cos(2¼f2) ¢ ¢ ¢ cos(2¼f2(N ¡ 1))

#TH2 = [1 cos(2¼f3) ¢ ¢ ¢ cos(2¼f3(N ¡ 1))]T:

We use (63) and (66) as our test statistics for the

two methods, respectively, when ¾2 is known.

Test statistics in (71) and (75) are used when ¾2 is

unknown. The probabilities of correct classification

are plotted versus ln(1=¾2) in Fig. 4. We see that

our method has the same performance with the

estimated MAP classifier with known or unknown ¾2,

and probabilities of correct classification go to 1 as

¾2! 0.

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 589

Page 11: Sensor Integration by Joint PDF Construction using the Exponential Family

Fig. 3. ROC curves for our method and clairvoyant GLRT with correlated Gaussian mixture noise. (a) False alarm rate between 0

and 1. (b) False alarm rate between 0.0001 and 0.01.

Fig. 4. Probability of correct classification for both methods.

Fig. 5. Probability of correct classification for both methods.

Next we consider a case when GTG is not a scaled

identity matrix. Let A1 = 0:5, A2 = 1, A3 = 1, and

s1(n) = cos(2¼f1n) +1

s2(n) = cos(2¼f2n) +0:5

s3(n) = cos(2¼f3n)

where n= 0,1, : : : ,N ¡ 1 with N = 20, and f1 = 0:17,f2 = 0:28, and f3 = 0:45. Let p(H1) = p(H2) = p(H3) =1=3. Assume that there are three sensors (this is an

extension of the two-sensor assumption), each with an

observation matrix as follows, respectively

H1 = [1 1 ¢ ¢ ¢ 1]T

H2 =

·1 cos(2¼f1) ¢ ¢ ¢ cos(2¼f1(N ¡ 1))1 cos(2¼f2) ¢ ¢ ¢ cos(2¼f2(N ¡ 1))

¸TH3 = [1 cos(2¼(f3 +0:02)) ¢ ¢ ¢

cos(2¼(f3 +0:02)(N ¡1))]T:Note that in H3, we set the frequency to f3 +0:02.

This models the case when the knowledge of the

frequency is not accurate. We also see in Fig. 5 that

the performances of both methods are the same with

known or unknown ¾2, and probabilities of correct

classification go to 1 as ¾2! 0.

VIII. CONCLUSIONS

A novel method of constructing the joint pdf of

the measurements from a distributed multiple sensor

systems has been proposed. Only a reference pdf is

needed in the construction. It is shown that moment

matching and MLE are equivalent. The performance

of our method has shown to be as good as the

clairvoyant GLRT and estimated MAP classifier,

respectively, for detection and classification, while less

information is needed for our method.

590 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 12: Sensor Integration by Joint PDF Construction using the Exponential Family

REFERENCES

[1] Kay, S.

Fundamentals of Statistical Signal Processing: Detection

Theory.

Upper Saddle River, NJ: Prentice-Hall, 1998.

[2] Thomopoulos, S., Viswanathan, R., and Bougoulias, D.

Optimal distributed decision fusion.

IEEE Transactions on Aerospace and Electronic Systems,

25 (Sept. 1989), 761—765.

[3] Chair, Z. and Varshney, P.

Optimal data fusion in multiple sensor detection systems.

IEEE Transactions on Aerospace and Electronic Systems,

22 (Jan. 1986), 98—101.

[4] Kittler, J., et al.

On combining classifiers.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 20, 3 (Mar. 1998), 226—239.

[5] Sundaresan, A., Varshney, P., and Rao, N.

Distributed detection of a nuclear radioactive source using

fusion of correlated decisions.

Proceedings of the 2007 10th International Conference on

Information Fusion, Quebec, Canada, July 9—12, 2007, pp.

1—7.

[6] Iyengar, S., Varshney, P., and Damarla, T.

A parametric copula based framework for multimodal

signal processing.

Proceedings of the IEEE International Conference on

Acoustics, Speech, and Signal Processing (ICASSP 2009),

Taipei, Taiwan, Apr. 19—24, 2009, pp. 1893—1896.

[7] Kay, S. and Ding, Q.

Exponentially embedded families for multimodal sensor

processing.

Proceedings of the IEEE International Conference on

Acoustics, Speech, and Signal Processing, (ICASSP 2010),

Dallas, TX, Mar. 14—19, 2010, pp. 3770—3773.

[8] Kay, S.

Fundamentals of Statistical Signal Processing: Estimation

Theory.

Upper Saddle River, NJ: Prentice-Hall, 1993.

[9] Kay, S., Nuttall, A., and Baggenstoss, P.

Multidimensional probability density function

approximations for detection, classification, and model

order selection.

IEEE Transactions on Signal Processing, 49 (Oct. 2001),

2240—2252.

[10] Cover, T. and Thomas, J.

Elements of Information Theory (2nd ed.).

Hoboken, NJ: Wiley, 2006.

[11] Westover, M.

Asymptotic geometry of multiple hypothesis testing.

IEEE Transactions on Information Theory, 54, 7 (July

2008), 3327—3329.

[12] Kullback, S.

Information Theory and Statistics (2nd ed.).

New York: Courier Dover Publications, 1997.

[13] Efron, B.

Defining the curvature of a statistical problem (with

applications to second order efficiency).

The Annals of Statistics, 3, 6 (1975), 1189—1242.

[14] Brown, L.

Fundamentals of Statistical Exponential Families.

Beachwood, OH: Institute of Mathematical Statistics,

1986.

[15] Boyd, S. and Vandenberghe, L.

Convex Optimization.

Cambridge, UK: Cambridge University Press, 2004.

[16] Luenberger, D.

Linear and Nonlinear Programming (2nd ed.).

New York: Springer, 2003.

[17] Hall, P.

The Bootstrap and Edgeworth Expansion.

New York: Springer, 1997.

[18] Kay, S., Ding, Q., and Emge, D.

Joint PDF construction for sensor fusion and distributed

detection.

Proceedings of the 2010 13th International Conference

on Information Fusion (FUSION), Edinburgh, UK, July

26—29, 2010, pp. 1—6.

[19] Kassam, S. A.

Signal Detection in Non-Gaussian Noise.

New York: Springer, 1988.

[20] Kay, S., Ding, Q., and Rangaswamy, M.

Sensor integration for classification.

Proceedings of the 2010 Conference Record of the

Forty-Fourth Asilomar Conference on Signals, Systems and

Computers (ASILOMAR), Pacific Grove, CA, Nov. 7—10,

2010, pp. 1658—1661.

[21] Kay, S.

Exponentially embedded families–New approaches to

model order estimation.

IEEE Transactions on Aerospace and Electronic Systems,

41, 1 (Jan. 2005), 333—345.

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 591

Page 13: Sensor Integration by Joint PDF Construction using the Exponential Family

Steven Kay (M’XX–F’XX) was born in Newark, NJ on April 5, 1951. Hereceived his B.E. degree from Stevens Institute of Technology, Hoboken, NJ in

1972, his M.S. degree from Columbia University, NY in 1973, and his Ph.D.

degree from the Georgia Institute of Technology, Atlanta, GA in 1980, all in

electrical engineering.

From 1972 to 1975 he was with Bell Laboratories, Holmdel, NJ, where

he was involved with transmission planning for speech communications and

simulation and subjective testing of speech processing algorithms. From 1975 to

1977 he attended the Georgia Institute of Technology to study communication

theory and digital signal processing. From 1977 to 1980 he was with the

Submarine Signal Division, Portsmouth, RI, where he engaged in research

on autoregressive spectral estimation and the design of sonar systems. He is

presently a Professor of Electrical Engineering at the University of Rhode Island,

Kingston and a consultant to numerous industrial concerns, the Air Force, the

Army, and the Navy. As a leading expert in statistical signal processing, he has

been invited to teach short courses to scientists and engineers at government

laboratories, including NASA and the CIA. His current interests are spectrum

analysis, detection and estimation theory, and statistical signal processing.

Dr. Kay has written numerous journal and conference papers and is a

contributor to several edited books. He is the author of the textbooks Modern

Spectral Estimation (Prentice-Hall, 1988), Fundamentals of Statistical Signal

Processing, Vol. I: Estimation Theory (Prentice-Hall, 1993), Fundamentals of

Statistical Signal Processing, Vol. II: Detection Theory (Prentice-Hall, 1998), and

Intuitive Probability and Random Processes using MATLAB (Springer, 2005). He is

a member of Tau Beta Pi and Sigma Xi. He has been a distinguished lecturer for

the IEEE Signal Processing Society. He has been an associate editor for IEEE

Signal Processing Letters and IEEE Transactions on Signal Processing. He has

received the IEEE Signal Processing Society Education Award “for outstanding

contributions in education and in writing scholarly books and texts: : :” He has

recently been included on a list of the 250 most cited researchers in the world in

engineering.

Quan Ding received his B.S. degree from Shanghai Jiao Tong University,

Shanghai in 2006 and the M.S. and Ph.D. degrees from the University of Rhode

Island, Kingston, RI in 2008 and 2011, respectively, all in electrical engineering.

Since 2011 he has been a postdoctoral fellow in electrical engineering at the

University of Rhode Island. His primary research interests include statistical

signal processing, detection and estimation, sensor fusion, spectral analysis, and

biomedical signal processing.

592 IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 1 JANUARY 2013

Page 14: Sensor Integration by Joint PDF Construction using the Exponential Family

Muralidhar Rangaswamy (S’89–M’93–SM’98–F’06) received his B.E.degree in electronics engineering from Bangalore University, Bangalore, India

in 1985 and M.S. and Ph.D. degrees in electrical engineering from Syracuse

University, Syracuse, NY in 1992.

He is presently employed as the technical advisor for the RF Exploitation

Technology Branch within the Sensors Directorate of the Air Force Research

Laboratory (AFRL). Prior to this he held industrial and academic appointments.

His research interests include radar signal processing, spectrum estimation,

modeling non-Gaussian interference phenomena, and statistical communication

theory.

He has coauthored more than 150 refereed journal and conference record

papers in the areas of his research interests. Additionally, he is a contributor to 5

books and is a coinventor on 3 U.S. patents. He is the technical editor (Associate

Editor-in-Chief) for Radar Systems in the IEEE Transactions on Aerospace and

Electronic Systems (IEEE-TAES). He served as the coeditor-in-chief for the

Digital Signal Processing Journal between 2005 and 2011. He serves on the

senior editorial board of the IEEE Journal of Selected Topics in Signal Processing

(January 2012—December 2014). He was a 2-term elected member of the Sensor

Array and Multichannel Processing Technical Committee (SAM-TC) of the

IEEE Signal Processing Society between January 2005 and December 2010 and

serves as a Member of the Radar Systems Panel (RSP) in the IEEE—AES Society.

He was the general chairman for the 4th IEEE Workshop on Sensor Array and

Multichannel Processing (SAM-2006), Waltham, MA, July 2006. He has served

on the Technical Committee of the IEEE Radar Conference series in a myriad

of roles (track chair, session chair, special session organizer and chair, paper

selection committee member, and tutorial lecturer). He served as the publicity

chair for the First IEEE International Conference on Waveform Diversity and

Design, Edinburgh, UK, November 2004. He presently serves on the conference

subcommittee of the RSP.

Dr. Rangaswamy received the 2004 (Fred Nathanson Memorial) Outstanding

Young Radar Engineer Award from the IEEE AES Society, the 2006

Distinguished Member Award from the IEEE Boston Section, the 2007 IEEE

Region 1 Award, and the 2005 Charles Ryan Basic Research Award from the

Sensors Directorate of AFRL, in addition to more than 40 AFRL scientific

achievement awards.

KAY, ET AL.: SENSOR INTEGRATION BY JOINT PDF CONSTRUCTION USING THE EXPONENTIAL FAMILY 593