10
Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical applications Xuexing Zeng a , Jinchang Ren b,n , Meijun Sun c , Stephen Marshall b , Tariq Durrani b a School of Information Engineering, China University of Geosciences, Beijing, China b Centre for Excellence in Signal and Image Processing (CeSIP), Department of Electronic and Electrical Engineering, University of Strathclyde, 204 George Street, Glasgow, United Kingdom c School of Computer Science and Technology, Tianjin University, Tianjin, China article info Article history: Received 21 January 2013 Received in revised form 19 May 2013 Accepted 7 July 2013 Available online 17 July 2013 Keywords: Copulas Statistical signal processing Copula random variables generation Optimal copula selection abstract This paper presents algorithms for generating random variables for exponential/Rayleigh/ Weibull, Nakagami-m and Rician copulas with any desired copula parameter(s), using the direct conditional cumulative distribution function method and the complex Gaussian distribution method. Moreover, a novel method for optimal copula selection is also proposed, based on the criterion that for a given series of copulas, the optimal copula will have its copula density based mutual information closest to the corresponding bivariate distribution based mutual information. The corresponding bivariate distribution is the bivariate distribution that is used to derive this copula. Akaike information criterion (AIC) and Bayesinformation criterion (BIC) are compared with the proposed mutual informa- tion based criterion for optimal copula selection. In addition, several case studies are also presented to further validate the effectiveness of the copulas, which include dual branch selection combining diversity using Nakagami-m, exponential/Rayleigh/Weibull and Rician copulas with different marginal distributions as in real applications. & 2013 Elsevier B.V. All rights reserved. 1. Introduction One of the most important advantages of copulas is to generate correlated random variables with arbitrary mar- ginal distributions. However, obviously, this requires the successful generation of random variables for copulas. Copula random variables generation is also called copula simulation, which is widely used for constructing corre- sponding statistical models [1]. Once a suitable copula (an optimal copula) that can accurately model the dependence between marginal distributions is determined, the mar- ginal variables x and y can be generated by their inverse marginal cumulative distribution function (cdf), based on the fact that the variables u and v of copula function are both uniformly distributed ([1, Section 2.9]). Obviously, to utilize a copula function, generating random variables for the copula function is essential. In Part I of this paper, we have derived the copulas for Nakagami-m, exponential/ Rayleigh/Weibull, and Rician distributions. And in this part, we will focus on algorithms for generating random variables for these copulas as no such algorithm have been presented so far. Note that selecting a suitable copula is quite important for modeling dependence correctly between marginal distributions, and optimal copula is only determined by the marginal distributions. The empirical copula, graphic plot and Akaike information criterion (AIC) are popular methods currently for choosing optimal copula [2], and a good survey regarding this topic can be found in [3]. However, the empirical copula based method is too time consuming, thus it is not suitable for large dataset. The Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing 0165-1684/$ -see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sigpro.2013.07.006 n Corresponding author. Tel.: +44 141 5482384. E-mail addresses: [email protected], [email protected] (J. Ren). Signal Processing 94 (2014) 681690

Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

Contents lists available at ScienceDirect

Signal Processing

Signal Processing 94 (2014) 681–690

0165-16http://d

n CorrE-m

npurjc@

journal homepage: www.elsevier.com/locate/sigpro

Copulas for statistical signal processing (Part II): Simulation,optimal selection and practical applications

Xuexing Zeng a, Jinchang Ren b,n, Meijun Sun c, Stephen Marshall b,Tariq Durrani b

a School of Information Engineering, China University of Geosciences, Beijing, Chinab Centre for Excellence in Signal and Image Processing (CeSIP), Department of Electronic and Electrical Engineering, University ofStrathclyde, 204 George Street, Glasgow, United Kingdomc School of Computer Science and Technology, Tianjin University, Tianjin, China

a r t i c l e i n f o

Article history:Received 21 January 2013Received in revised form19 May 2013Accepted 7 July 2013Available online 17 July 2013

Keywords:CopulasStatistical signal processingCopula random variables generationOptimal copula selection

84/$ - see front matter & 2013 Elsevier B.V.x.doi.org/10.1016/j.sigpro.2013.07.006

esponding author. Tel.: +44 141 5482384.ail addresses: [email protected],yahoo.com (J. Ren).

a b s t r a c t

This paper presents algorithms for generating random variables for exponential/Rayleigh/Weibull, Nakagami-m and Rician copulas with any desired copula parameter(s), using thedirect conditional cumulative distribution function method and the complex Gaussiandistribution method. Moreover, a novel method for optimal copula selection is alsoproposed, based on the criterion that for a given series of copulas, the optimal copula willhave its copula density based mutual information closest to the corresponding bivariatedistribution based mutual information. The corresponding bivariate distribution is thebivariate distribution that is used to derive this copula. Akaike information criterion (AIC)and Bayes’ information criterion (BIC) are compared with the proposed mutual informa-tion based criterion for optimal copula selection. In addition, several case studies are alsopresented to further validate the effectiveness of the copulas, which include dual branchselection combining diversity using Nakagami-m, exponential/Rayleigh/Weibull andRician copulas with different marginal distributions as in real applications.

& 2013 Elsevier B.V. All rights reserved.

1. Introduction

One of the most important advantages of copulas is togenerate correlated random variables with arbitrary mar-ginal distributions. However, obviously, this requires thesuccessful generation of random variables for copulas.Copula random variables generation is also called copulasimulation, which is widely used for constructing corre-sponding statistical models [1]. Once a suitable copula (anoptimal copula) that can accurately model the dependencebetween marginal distributions is determined, the mar-ginal variables x and y can be generated by their inversemarginal cumulative distribution function (cdf), based on

All rights reserved.

the fact that the variables u and v of copula function areboth uniformly distributed ([1, Section 2.9]). Obviously, toutilize a copula function, generating random variables forthe copula function is essential. In Part I of this paper, wehave derived the copulas for Nakagami-m, exponential/Rayleigh/Weibull, and Rician distributions. And in thispart, we will focus on algorithms for generating randomvariables for these copulas as no such algorithm have beenpresented so far.

Note that selecting a suitable copula is quite importantfor modeling dependence correctly between marginaldistributions, and optimal copula is only determined bythe marginal distributions. The empirical copula, graphicplot and Akaike information criterion (AIC) are popularmethods currently for choosing optimal copula [2], and agood survey regarding this topic can be found in [3].However, the empirical copula based method is too timeconsuming, thus it is not suitable for large dataset. The

Page 2: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690682

graphic plot based method chooses the optimal copulavisually and thus is not suitable for real-time signalprocessing. Since AIC is a likelihood-based method, usuallyit requires extremely large dataset for model selection.

In this paper, a novel mutual information based methodfor automatic selection of optimal copulas is also pre-sented, which suits for the case when the associatedcopula density functions and their corresponding jointprobability density functions (pdf) are available. Thereare three popular methods for copula parameter estima-tion. The first method is the Kendall's tau or Spearman'srho based method ([4, Section 3.1.2]), ([4, Section 3.1.3]),this method requires an analytical expression linking thecopula parameter and Kendall's tau or Spearman's rho, andsuch expressions are not available for Weibull/exponen-tial/Rayleigh, Nakagami-m and Rician copulas. Besides, themaximum likelihood estimation (MLE) is quite popular forcopula parameter estimation. Let X ¼ fðXt

1;Xt2ÞÞTt ¼ 1 denotes

a observation sample, where Xt1 and Xt

2 are two vectorswith the length T, the expression of log-likelihood of jointprobability density function can be written in terms of thecopula density and marginal probability density functionsas ([4, Section 5.2])

ℓðαÞ ¼ ∑T

t ¼ 1ln cðF1ðxt1; θ1Þ; F2ðxt2; θ2ÞÞ

þ ∑T

t ¼ 1∑2

n ¼ 1lnf nðxtn; θnÞ ð1Þ

where c is the copula density function, F1 and F2 are twomarginal cumulative distribution functions (cdf) withparameters θ1 and θ2 respectively and function f1 and f2are marginal pdfs. There are two popular MLE methods forthis parameter estimations, the first is called Inference forMargins (IFM) which estimates the copula parameter bytwo steps ([4, Section 5.3]):

Step 1: θ̂n ¼ argmaxΣTt ¼ 1Σ

2n ¼ 1lnf nðxtn; θnÞ

Step 2: Estimate the parameter(s) α̂ of copula.

α̂¼ argmax ℓðαÞ ¼ argmax ∑T

t ¼ 1ln cðF1ðxt1; θ1Þ; F2ðxt2; θ2Þ;αÞ

ð2ÞThe second MLE method is called Canonical Maximum

Likelihood method (CML), it estimates the copula para-meter(s) α̂ as follows ([4, Section 5.4]): the data xt1 and xt2are first transformed into the uniform variants by usingempirical distributions, then use the following MLE inEq. (3) to estimate the parameters α̂ of the copula.

α̂¼ argmax ∑2

t ¼ 1ln cðu1;u2; αÞ ð3Þ

The empirical distribution is defined as: FnðxÞ ¼ ð1=nÞΣni ¼ 1IðXi≤xÞ, where IAðxÞ function is the indicator function

defined as

IAðxÞ ¼1 if x∈A0 if x∉A

:

(

The CML method considers the attractive advantage ofcopula that copula separates the dependence structureand marginal distributions since the marginal distribution

are usually unknown, and it is suitable for all the copulasmentioned in this paper. Thus we always adopt CML as auniversal method for copula parameter estimation in thispaper.

The remainder of this paper is organized as follows.Section 2 discusses methods for generating random vari-ables for exponential/Rayleigh/Weibull, Nakagami-m andRician copulas using univariate marginal cdf method. InSection 3, a novel method for optimal copula selectionbased on mutual information is proposed. Section 4 pre-sents case studies to apply these copulas in applications ofdual branch selection combining diversity problems incommunication field. Concluding marks are summarizedin Section 5.

2. Copula random variables generation

Let FX( � ) and FY( � ) be two marginal cumulative dis-tribution functions (cdf) for random variables x and y,respectively, the corresponding bivariate copula function isgiven below ([1, Section 2.3])

Cðu; vÞ ¼ FXY ðx; yÞ ¼ FXY ðF�1X ðuÞ; F�1

Y ðvÞÞ ð4Þwhere the simulation task here is to generate uniformlydistributed random variables (u, v) whose joint distribu-tion function is C.

One of the most commonly used approaches for generat-ing copula randomvariables is the conditional copula method,and the conditional copula is defined as ([1, Section 2.9])

cuðvÞ ¼ PrðV≤vjU ¼ uÞ ¼ limΔu-0

Cðuþ Δu; vÞ�Cðu; vÞΔu

¼ ∂Cðu; vÞ∂u

ð5Þwhere cuðvÞ is the partial derivative of the copula, a non-decreasing function which exists for almost all v∈½0;1�.

The conditional copula based method can be describedin the following three steps:

1)

Generate two independent uniform random variables t1and t2∈½0;1�;

2)

Let u¼ t1, cuðvÞ ¼ t2; 3) Compute v¼ c�1

u ðt2Þ, and the pairs (u, v) are thenregarded as the desired copula random variables, wherec�1u denotes a quasi-inverse of cu. The definition of thequasi-inverse function can be found in ([1, Section 2.3]).

As can be seen, conditional copula based method isquite intuitive and works well for some copulas, especiallyfor Archimedean copulas where the analytical expressionof c�1

u ( ) is resolvable. However, in more complicated cases,it is extremely difficult to find the analytical expression ofc�1u ( ). To this end, an alternative approach using univariatemarginal cdf should be applied as discussed below.

For a specific copula, the univariate marginal cdf basedmethod can also be summarized in the following threesteps ([4, Section 6.2]):

1)

Generate two independent uniform random variables t1and t2∈½0;1�;

2)

Generate random variables pairs (x, y) from t1 and t2according to their bivariate distribution;
Page 3: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690 683

3)

Let u¼ FXðxÞ and v¼ FY ðyÞ, and the pairs (u, v) are thentaken as the desired copula random variables.

Note that FXðÞ and FY ðÞ are the marginal cdfs used toderive the copula. Apparently the univariate marginal cdfbased method requires that the random variables pairs (x, y)themselves can be generated before the generation of thecopula random variables. In this paper, the univariate mar-ginal cdf based method will be employed for copula randomvariables generation since it is too difficult to directly derivethe analytical expression of c�1

u ( � ) for our newly derivedcopulas.

Also note that the log-normal copula has been provento be equivalent to Gaussian copula, and thus the samemethod that is applied for Gaussian copula in ([4, Section6.2]) can be used for simulating of the log-normal copula.Therefore in this paper we will focus on the simulation ofexponential/Rayleigh/Weibull, Nakagami-m and Riciancopulas as these have not been investigated.

2.1. Bivariate exponential/Rayleigh/Weibull copula

In Part I of the paper [20], we have derived the bivariateexponential (also Rayleigh and Weibull) copula function as:

Cðu; vÞ ¼ 1þ eρa2�a2

Z ρa2

0e�sI0ð2

ffiffiffiffiffiffiffia1s

p Þ ds�1� �

�e�a1

Z a2

0e�sI0ð2

ffiffiffiffiffiffiffiffiffiffiρa1s

p Þ ds ð6Þ

where a1 ¼�lnð1�uÞ=ρ′, a2 ¼�lnð1�vÞ=ρ′, and ρ′¼ 1�ρ.Alternatively, it can be also defined by using Marcum's

Q function as [20]:

Cðu; vÞ ¼ 1þ ð1�vÞQ1ðffiffiffiffiffiu1

p;ffiffiffiffiffiffiffiffiρv1

p Þ�ð1�uÞð1�Q1ð

ffiffiffiffiffiffiffiffiρu1

p;ffiffiffiffiffiv1

p ÞÞ ð7Þ

where u1 ¼ 2lnð1�uÞðρ�1Þ�1 and v1 ¼ 2lnð1�vÞðρ�1Þ�1.To generate random variables for exponential, Rayleigh

and Weibull copulas, we need first generate randomvariables (x, y) for bivariate exponential/Rayleigh/Weibulldistribution before the univariate marginal cdf method canbe applied. To this end, two methods, based on inverse cdfand complex Gaussian distribution, respectively, are uti-lized as discussed below.

2.1.1. Inverse cdf based univariate marginal CDF methodThe exponential copula has been proved to be equivalent

to the Rayleigh and Weibull copulas, thus we only takeexponential copula as an example for random variable gen-eration of these three copulas. Note that same results areachieved if we choose bivariate Rayleigh or Weibull distribu-tion to generate random variables for the three copulas.

Firstly, we generate two independent uniform randomvariables t1 and t2∈½0;1�. And then we compute the condi-tional pdf of y given x for bivariate exponential distribu-tions by:

f Y jXðyjxÞ ¼f XY ðx; yÞf XðxÞ

¼ μ

1�ρexp λx� λxþ μy

1�ρ

� �I0

21�ρ

ffiffiffiffiffiffiffiffiffiffiffiffiρλμxy

p� �ð8Þ

The associated conditional cdf can be derived as:

FYjXðyjxÞ ¼Z y

0

μ

1�ρexp � ρλxþ μs

1�ρ

� �I0

21�ρ

ffiffiffiffiffiffiffiffiffiffiffiffiρλμxs

p� �ds

ð9Þ

Note that the Marcum Q function is defined as Qm

ða; bÞ ¼ R1b xðx=aÞm�1e�ððx2þa2Þ=2ÞIm�1ðaxÞ dx. For the special

case with m¼1, the Marcum's Q function becomesQ1ða;bÞ ¼

R1b xe�ððx2þa2Þ=2ÞI0ðaxÞ dx. The Marcum Q function

can also be written as

Qmða; bÞ ¼ 1�a1�mZ b

0xmexp � a2 þ x2

2

� �Im�1ðaxÞ dx ð10Þ

and again for the special case with m¼1, we have 1�Q1

ða; bÞ ¼ R b0 texpð�ðða2 þ t2Þ=2ÞI0ðatÞ dt. After algebraic

manipulation, we can derive FY jXðyjxÞ ¼ 1�Q1ðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2ρλx=ð1�ρÞÞ

p;ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ð2μy=ð1�ρÞÞp

. Let FYjXðyjxÞ ¼ t2, we can also derive that

y¼ ðð1�ρÞ=2μÞQ�11 ð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2ρλx=ð1�ρÞ

p;1�t2Þ. Note that if Q1ða;

bÞ ¼ s, we have Q�11 ða; sÞ ¼ b, where Q�1

1 ða; sÞ denotes theinverse Marcum Q function on variable b. Let x¼F�1X ðt1Þ ¼�ð1=λÞlnð1�t1Þ, as a result, (x, y) are desired

random pairs for bivariate exponential distribution. Notethat the pairs (x, y) depends on three parameters, λ; μ andρ. Finally, we have v¼ FY ðyÞ ¼ 1�expð�μyÞ ¼ 1�exp fð�ð1�ρÞ=2Þ½Q�1

1 ðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið�2ρlnð1�t1ÞÞ=ð1�ρÞ

p;1�t2Þ�2g.

Let u¼t1, then the pairs (u, v) are the desired randomvariables for bivariate exponential, Rayleigh and Weibullcopulas. It can be found that the correlated exponentialdistributed pairs (x, y) depends on the parameters λ; μ andρ, while the random variables of bivariate exponential,Rayleigh and Weibull copulas are only depends on theparameter ρ. This can be also validated by Eq. (6) or Eq. (7).In other words, this means that copulas have helped tosimplify the dependence structure of these random vari-ables whilst preserving the most significant characteristicsamong them.

2.1.2. Complex Gaussian distribution based univariatemarginal CDF method

Since multivariate Rayleigh distribution can be repre-sented using a set of zero-mean complex Gaussianrandom variables as Gk ¼ skð

ffiffiffiffiffiffiffiffiffi1�ρ

pXk þ ffiffiffi

ρp

X0Þ þ iskðffiffiffiffiffiffiffiffiffi1�ρ

pYk þ ffiffiffi

ρp

Y0Þ, where k¼0, 1, 2…, L, i¼ffiffiffiffiffiffiffi�1

pand

Xk;Yk∼Gð0;1=2Þ are independent [5], complex Gaussianrandom variables can be used to generate random vari-ables for Rayleigh copula as well as its equivalences, i.e.exponential and Weibull copulas. Here, we have GK∼CGð0;1=2Þ, where G( ) and CG( ) represent Gaussiandistribution and complex Gaussian distribution, respec-tively. As a result, jGK j is a set of Rayleigh randomvariables, and we can derive the mean-squared value ofjGkj is EðjGkj2Þ ¼ s2k . Note that EðjGkj2Þ ¼Ωk, thus we have

Gk ¼ffiffiffiffiffiffiΩk

pðffiffiffiffiffiffiffiffiffi1�ρ

pXk þ

ffiffiffiρ

pX0Þ

þiffiffiffiffiffiffiΩk

pðffiffiffiffiffiffiffiffiffi1�ρ

pYk þ

ffiffiffiρ

pY0Þ ð11Þ

where ρ refers to the cross-correlation coefficient betweenany Gk and Gj (k≠j).

Page 4: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690684

Let x¼ jG1j ¼ffiffiffiffiffiffiΩ1

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX1 þ ffiffiffi

ρp

X0Þ2 þ ðffiffiffiffiffiffiffiffiffi1�ρ

pY1

qþ ffiffiffi

ρp

Y0Þ2 and y¼ jG2j ¼ffiffiffiffiffiffiΩ2

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX2 þ ffiffiffi

ρp

X0Þ2þq

ðffiffiffiffiffiffiffiffiffi1�ρ

pY2 þ ffiffiffi

ρp

Y0Þ2, as a result, (x, y) are desired randompairs for bivariate Rayleigh distribution. Consequently, wecan simulate (u, v) for exponential/Rayleigh/Weibull copu-las by computing

u¼ 1�expf�½ðffiffiffiffiffiffiffiffiffi1�ρ

pX1 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY1 þ

ffiffiffiρ

pY0Þ2�g

v¼ 1�expf�½ðffiffiffiffiffiffiffiffiffi1�ρ

pX2 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY2 þ

ffiffiffiρ

pY0Þ2�g

Note that the generation of correlated Rayleigh distrib-uted pairs (x, y) depends on the parameters ΩX ; ΩY and ρ.However, the generation of random variables for bivariateexponential/Rayleigh/Weibull copulas only depends on theparameter ρ.

2.2. Bivariate Nakagami-m copula

Similar to the bivariate exponential/Rayleigh/Weibullcopulas, both inverse cdf based method and complexGaussian distribution based method are utilized to gen-erate random variables (u, v) for bivariate Nakagami-mcopula as presented below.

2.2.1. Inverse cdf based univariate marginal CDF methodThe Nakagami-m copula has been defined as [6]

Cðu; vÞ ¼ ð1�ρÞmΓðmÞ ∑

1

k ¼ 0

ρkΓðmþ kÞ2k!

� �P mþ k;

P�1ðm;uÞ1�ρ

" #

P mþ k;P�1ðm; vÞ

1�ρ

" #ð12Þ

To simulate random variables for Nakagami-m copula,we firstly generate two independent uniform randomvariables t1 and t2∈½0;1�. According to the definition ofbivariate Nakagami-m pdf and marginal Nakagami-m pdfin [20, Section 2.4], we can derive the conditional pdf of ygiven x for bivariate Nakagami-m distributions as

f YjXðyjxÞ ¼2mx1�mymΩðm�1Þ=2

X

ð1�ρÞρðm�1Þ=2Ωðmþ1Þ=2Y

exp

�mðρx2=ΩX þ y2=ΩY Þ1�ρ

� �Im�1

2mxyffiffiffiρ

p1�ρ

ffiffiffiffiffiffiffiffiffiffiffiffiffiΩXΩY

p� �

ð13Þ

The associated conditional cdf can be derived as byapplying Eq. (10)

FYjXðyjxÞ ¼ 1�Qm x

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2mρ

ð1�ρÞΩX

s; y

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2m

ð1�ρÞΩY

s !ð14Þ

Let FYjxðyjxÞ ¼ t2, thus we have y¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðð1�ρÞΩY Þ=2m

pQ�1

m

½xffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2mρ=ðð1�ρÞΩXÞÞ

p;1�t2�, where m is a positive integer.

However, in some applications, the parameter m of theNakagami distributions can be non-integer. For these cases,we can consider that the generalized Marcum Q function isthe complementary cdf of the noncentral chi-squared dis-tribution with 2m degrees of freedom, where 2m is notnecessarily an integer [18]. The relationship between general-ized Marcum Q function and noncentral chi-squared distribu-

tion can be expressed as 1�Qmða; bÞ ¼ Kðb2;2m; a2Þ, where K( ) denotes the non-central chi-squared distribution, and λ is

called noncentrality parameter. Let Qmða; bÞ ¼ y, we canderive that the inverse Marcum Q function about its second

variable as Q�1m ða; yÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiK�1ð1�y;2m; a2Þ

q. Note that the

inverse noncentral chi-square distribution is supported byMatlab through the build-in function ncx2inv.

Let x¼ F�1X ðt1Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið�ðΩX=mÞÞP�1ðm; t1Þ

q, then (x, y) are

the desired random pairs for bivariate Nakagami-m dis-tribution. Note that Eq. (14) is different from Eq. (5) in [7]since the Nakagami-m distribution in this paper is definedas f XðxÞ ¼ ðð2mmx2m�1Þ=ðΓðmÞΩm

X ÞÞexpð�ðmx2=ΩXÞÞ, whereit is defined in a different way as f XðxÞ ¼ ðð2x2m�1Þ=ðΓðmÞΩm

X ÞÞexpð�ðx2=ΩXÞÞ in [7]. However, the same expres-sion of Nakagami-m copula can be derived from these twodifferent definitions. As a result, we have v¼ FY ðyÞ ¼Pðm; ðmy2=ΩY ÞÞ ¼ Pðm; ðð1�ρÞ=2Þ½Q�1

m ðffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið2ρP�1ðm;

qt1ÞÞ=ð1�

ρÞ; 1�t2Þ�2Þ.Finally, let u¼t1, the pair (u, v) becomes the desired

random variables for bivariate Nakagami-m copula. As canbe seen, the generation of random variables only dependson the parameters ρ and m, just as validated in Eq. (12).

2.2.2. Complex Gaussian distribution based univariatemarginal CDF method

Complex Gaussian random variables can also be used togenerate random variables for Nakagami-m copula asmultivariate Nakagami-m distribution can be representedby using a set of zero-mean complex Gaussian randomvariables defined in [5] as

Gkj ¼ skðffiffiffiffiffiffiffiffiffi1�ρ

pXkj þ

ffiffiffiρ

pX0jÞ þ iskð

ffiffiffiffiffiffiffiffiffi1�ρ

pYkj þ

ffiffiffiρ

pY0jÞ

where k¼0, 1, 2…, L; j¼1, 2…, m; i¼ffiffiffiffiffiffiffi�1

p, and

Xkj;Ykj∼Gð0;1=2Þ are independent to each other. Conse-quently, we have Gkj∼CGð0;1=2Þ, where again G( ) andCG( ) represent Gaussian distribution and complex Gaus-sian distribution, respectively. The cross-correlation coeffi-cient between any Gkj and Gln (k≠j and j¼n) is denoted as

ρ. We can easily derive that Nk ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΣmj ¼ 1jGkjj2

qis a set of

Nakagami-m random variables, and the mean-squared

value of Nk is EðN2k=mÞ ¼ms2k . Note that the mean-

squared value also equals Ωk. Consequently, we have

Gkj ¼ffiffiffiffiffiffiΩk

m

rðffiffiffiffiffiffiffiffiffi1�ρ

pXkj þ

ffiffiffiρ

pX0jÞ

þi

ffiffiffiffiffiffiΩk

m

rðffiffiffiffiffiffiffiffiffi1�ρ

pYkj þ

ffiffiffiρ

pY0jÞ ð15Þ

Let x¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΣmj ¼ 1jG1jj2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΣmj ¼ 1ðΩ1=mÞ½ð

ffiffiffiffiffiffiffiffiffi1�ρ

pX1j þ ffiffiffi

ρp

X0jÞ2þq

ðffiffiffiffiffiffiffiffiffi1�ρ

pY1j þ ffiffiffi

ρp

Y0jÞ2 and y¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΣmj ¼ 1jG2jj2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΣmj ¼ 1ðΩ2=mÞ

q½ð

ffiffiffiffiffiffiffiffiffi1�ρ

pX2j þ ffiffiffi

ρp

X0jÞ2þ ðffiffiffiffiffiffiffiffiffi1�ρ

pY2j þ ffiffiffi

ρp

Y0jÞ2, then (x, y) aretaken as the desired random pairs for bivariate Nakagami-m distribution. As a result, (u, v) are determined by

u¼ Pðm; ∑m

j ¼ 1½ð

ffiffiffiffiffiffiffiffiffi1�ρ

pX1j þ

ffiffiffiρ

pX0jÞ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY1j þ

ffiffiffiρ

pY0jÞ2�Þ

v¼ Pðm; ∑m

j ¼ 1½ð

ffiffiffiffiffiffiffiffiffi1�ρ

pX2j þ

ffiffiffiρ

pX0jÞ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY2j þ

ffiffiffiρ

pY0jÞ2�Þ

Page 5: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690 685

It can be found again that the Nakagami-m copula randomvariables only depend on the parameters ρ and m.

2.3. Bivariate Rician copula

The bivariate Rician copula has been derived in Part I ofthe paper [20] as:

Cðu; vÞ ¼Z b1

0

Z b2

0

z1z21�ρ2

exp � z21 þ z22 þ 2ð1�ρÞz22ð1�ρ2Þ

� �

∑þ1

k ¼ 0εkIk

ρz1z21�ρ2

� �Ik

z1z1þ ρ

� �Ik

z2z1þ ρ

� �dz1 dz2 ð16Þ

where b1 ¼Q�11 ðz;1�uÞ and b2 ¼Q�1

1 ðz;1�vÞ.Alternatively, it can be also defined by using infinite

series representation as [20]

Cðu; vÞ ¼ exp�z2

1þ ρ

� �∑1

k ¼ 0εkð�1Þk ∑

1

l;m;n ¼ 0

ð1þ ρÞ1�l0 ð1�ρÞ1þl′ρ2mþkz2l′

2l′l!m!n!ðlþ kÞ!ðmþ kÞ!ðnþ kÞ!γ lþmþ kþ 1;ð

½Q�11 ðz;1�uÞ�22ð1�ρ2Þ

!γ nþmþ kþ 1;

½Q�11 ðz;1�vÞ�22ð1�ρ2Þ

!ð17Þ

For bivariate Rician distribution, it is difficult to find theanalytical expression for conditional cdf of y given x.However, complex Gaussian distribution based methodcan still be applied for Rician copula random variablesgeneration as the multivariate Rician distribution can berepresented by a set of non-zero means complex Gaussianrandom variables given in [4] as

Gk ¼ sðffiffiffiffiffiffiffiffiffi1�ρ

pXk þ

ffiffiffiρ

pX0Þ þ isð

ffiffiffiffiffiffiffiffiffi1�ρ

pYk þ

ffiffiffiρ

pY0Þ ð18Þ

In Eq. (18), Xk∼Gð0; ð1=2ÞÞ and Yk∼Gð0; ð1=2ÞÞ are inde-pendent to each other, where k¼1, 2,…, L; X0∼Gðm1; ð1=2ÞÞand Y0∼Gðm2; ð1=2ÞÞ are also independent; m1 and m2 can

be arbitrary but satisfyingffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2

1 þm22

q¼ a. Here, we can

simply let m1 ¼m2 ¼ ðffiffiffi2

pa=2Þ, thus we have GK∼CGð ffiffiffi

ρp

ððffiffiffi2

pa=2Þ þ ið

ffiffiffi2

pa=2ÞÞ; sÞ, where G( ) and CG( ) represent

Gaussian distribution and complex Gaussian distribution,respectively. As a result, jGkj is a set of Rician randomvariables. Again the cross-coefficient between Gk and Gj

(k≠j) is denoted as ρ, and the mean-squared value of jGkj isderived as EðjGkj2Þ ¼ s2ð1þ ρa2Þ. Note that the mean-squared value of jGkj equals to a2 þ 2s2. Let z¼ ða=sÞ, wecan derive s2 ¼ ða2=ðρa2�1ÞÞ ¼ ðz2 þ 1Þ=ðρz2Þ, wherea24ð1=ρÞ.As a result, we have

x¼ jG1j ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX1 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY1 þ

ffiffiffiρ

pY0Þ2

qy¼ jG2j ¼ s

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX2 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY2 þ

ffiffiffiρ

pY0Þ2

qwhere X0∼Gð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiððz2 þ 1Þ=2ρÞ

p; ð1=2ÞÞ and Y0∼Gð

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiððz2 þ 1Þ=

p2ρÞ; ð1=2ÞÞ, and (x, y) are desired Rician random pairs.Consequently, bivariate Rician copula random variables(u, v) can be obtained as

u¼ 1�Q1ðz;ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX1 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY1 þ

ffiffiffiρ

pY0Þ2

v¼ 1�Q1ðz;ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðffiffiffiffiffiffiffiffiffi1�ρ

pX2 þ

ffiffiffiρ

pX0Þ2 þ ð

ffiffiffiffiffiffiffiffiffi1�ρ

pY2 þ

ffiffiffiρ

pY0Þ2

Please note the generation of Rician random pairs (x, y)depends on three parameters, i.e. s; z and ρ (or s; a and ρ),while the Rician copula random variables generation onlydepends on two parameters, ρ and z, since X0, Y0, Xk and Ykonly depend on ρ and z. This can be also validated by Eq.(16) or Eq. (17).

2.4. Comparison of the inverse cdf based method andcomplex Gaussian based generation of copula randomvariables

The inverse cdf based method for generating randomcopula variables is an intuitive method, and it suits for theWeibull/exponential/Rayleigh and Nakagami-m copulawith non-integer m, while integer m is required forcomplex Gaussian based method. However, the inversecdf based method usually needs to find an analyticalexpression of conditional pdf of bivariate distribution,which may be very difficult to achieve. For example, theinverse cdf based method is not applied for the Riciancopula, as an analytical expression of conditional asso-ciated pdf cannot be derived. In addition, the inverse cdfbased method is required to calculate the inverse MarcumQ function, and this increases the computational cost. Thecomplex Gaussian distribution based method can beapplied for the Weibull/exponential/Rayleigh and Riciancopulas. However, it is only valid for Nakagami-m copulawhen the integer m is positive.

3. Optimal copula selection

For a given problem, optimal copula selection isalways important for practical applications in accuratelymodeling of the data. Currently, empirical copula ([1,Section 5.6]), [8], graphic plot [9], AIC [3,10,11] are threemost popular used methods for optimal copula selection,and a good survey regarding this topic can be foundin [3].

Let fðxk; ykÞgnk ¼ 1 denotes a sample of size n from acontinuous bi-variate distribution, and the empiricalcopula Cn can be defined as [1]

Cnin;jn

� �

¼ ðnumber of pairs ðx; yÞ in the sample with x≤xðiÞ; y≤yðiÞÞn

ð19Þ

where x≤xðiÞ; y≤yðiÞ; 1≤i; j≤n denote order statistics fromthe sample. The candidate of copula is thought as theoptimal copula if it has the minimal Euclidean distance withthe empirical copula. However, the empirical copula basedmethod is usually time consuming, thus it is not suitablefor large dataset and real-time processing. Graphic plotbased method selects the optimal copula by visuallyinspecting the scatter plots of the observation data andsimulated results, and therefore it is not suitable for real-time processing as well.

AIC [3,11] is defined as

AIC ¼�2log½ckðu1;u2Þjα̂� þ qk ð20Þ

Page 6: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Pearson Correlation

Mut

ual I

nfor

mat

ion

Fig. 1. Mutual information for bivariate Gaussian distribution.

X. Zeng, et al. / Signal Processing 94 (2014) 681–690686

where k is the copula function index; qk ¼ 2pk and pk is thenumber of parameter of the kth copula; ck is the likelihoodfunction corresponding to the kth copula evaluatedby substituting the CML of Eq. (3) as MLE. Note that usingempirical distribution u1 and u2 has been transformedto uniform distribution from observations Xt

1 and Xt2,

respectively. We may also use IFM in Eq. (2) as MLE forAIC, though it suffers from unknown marginal distribu-tions. To this end, CML takes into consideration one of thesignificant advantage of copula that copula separatesthe dependence structure and marginal distributions.Another solution is Bayes’ information criterion (BIC)with qk ¼ pk logðTÞ, where T is the number of observations[11]. The copula function with the lowest AIC or BIC isthen regarded as the optimal one [3,11]. It has beenpointed out that none of these methods has beenproven to be superior in the survey regarding this topic[3]. Therefore, in this paper, a novel method based onmutual information measurement is presented for optimalcopula selection, and the results are compared with AICand BIC.

Mutual information has been employed for measuringthe dependence between two marginal distributions in awide range of applications such as image registration [12],data fusion [13] and MIMO system [14]. Basically, for tworandom variables X and Y, the mutual informationbetween them is defined as [15]

MIðX;YÞ ¼ ∬X;Y

f XY ðx; yÞlogf XY ðx; yÞf XðxÞf Y ðyÞ

dx dy ð21Þ

where fXY(x, y) is the joint pdf and fX(x) and fY(y) are thecorresponding marginal pdfs for X and Y, respectively. As aspecial example, the mutual information for two Gaussiandistributions with a correlation coefficient ρ has a verysimple form as [15]:

MIGau ¼�12lnð1�ρ2Þ ð22Þ

Actually, mutual information can also be defined byusing a copula density only as [16]

MIðx; yÞ ¼∬½0;1�2cðu; vÞlog cðu; vÞ du dv ð23Þ

where the copula in Eq. (23) should be a valid copula.Next we will examine the effect of mutual information

for optimal copula selection. Firstly, we generate a series of

Table 1Density functions of Clayton, Frank, Gaussian and Student’s t copulas definedcopulas are given in Part I of the paper.

Copulas Copula density functions

Clayton cðu; vÞ ¼ ð1þ θÞu�1�θð�1þ u�θ þ v�θÞ�2�1=θ

Frankcðu; vÞ ¼ θe½θð1þuþvÞ�ð�1þeθ Þ

½eθ�eθð1þuÞ�eθð1þvÞ þ e½θðuþvÞ� �2Gaussian

cðu; vÞ ¼ 1ffiffiffiffiffiffiffiffiffiffiffiffi1�ρ2

p eðð_u2þ_v2 Þ=2Þeðð

_u2þ_v2�2ρ_u_v Þ=ð�2ð1�ρ2 ÞÞÞ;

Student’s tcðu; vÞ ¼ Γððv=2Þ þ 1ÞΓðv=2Þð1þ ð½t�1

v ðuÞ�2 þ ½t�1v ðvÞ�ffiffiffiffiffiffiffiffiffiffiffiffi

1�ρ2p

Γððvþ 1Þ=2Þ2ð1þ ð½t�1v ðuÞ�2Þ=vÞ

Note: Φ( ) denotes the cdf of standard univariate Gaussian distribution, tvð Þ deGamma function. it is proven in this paper that bivariate exponential, Rayleigh

bivariate Gaussian distribution dataset with size 10,000�2 with 10 different Pearson correlations varying from 0 to1, where the mean values are 5 and 6, respectively andboth standard deviations are 1. And then we calculate themutual information for the bivariate Gaussian distributionusing Eq. (22) as the theoretic results. Next, we computethe Clayton, Frank, Gaussian, Rayleigh and Student’s tcopula based mutual information as several candidates,where the density functions for the Clayton, Frank andStudent’s t copulas are given in Table 1 [3]. These copulasdensity based mutual information, as shown in Fig. 1, arecomputed by the numerical integrals [17] since it isextremely difficult to find their analytical expressions.

From Fig. 1, we can easily make the following observa-tions: (i) Both Gaussian copula and Student's t copulahave generated exactly the same mutual information asexpected from the theoretic model, thus either Gaussian orStudent’s t copula can be selected as the optimal copula forthese datasets. Note that Student's t copula with a largedegrees of freedom can be considered as Gaussiancopula [3], and this explains the results above; (ii) Theunsuitable copulas will cause more or less errors to the

in [3], those for exponential/Rayleigh/Weibull, Nakagami-m and Rician

Parameters

θ∈½�1;1Þ θ≠0θ∈ð�1;þ1Þ θ≠0

_u ¼Φ�1ðuÞ_v ¼Φ�1ðvÞ

(ρ∈ð�1;1Þ

2�2ρt�1v ðuÞt�1

v ðvÞÞ=ðvð1�ρ2ÞÞÞ�ðvþ2Þ=2

�ðvþ1Þ=2ð1þ ð½t�1v ðvÞ�2Þ=vÞ�ðvþ1Þ=2

ρ∈ð�1;1Þ

notes the cdf of student t distribution with freedom v, and Γ( ) denotesand Weibull copulas are equivalent.

Page 7: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690 687

due dependence, where Clayton copula yields the worseresults of the maximum errors.

In addition, to further validate these observations, thesimulated datasets (x, y) are obtained by transforming (u, v)according to the Gaussian marginal cdf for these fourcopulas as shown in Fig. 2. Here, we choose the case ofρ¼0.6, the same results can be obtained as above, it is clearthat the Gaussian and Student’s t copula outperformsClayton, Frank and Rayleigh copulas for simulating Gaussiandistributed datasets as the simulated result has a goodmatching to the original dataset.

Given a series of copula candidates, the criterion foroptimal copula selection based on mutual information isdescribed as follows. First of all, we estimate mutualinformation from each copula density and also the corre-sponding bivariate distribution, respectively. Note that thecorresponding bivariate of a copula should be the bivariatedistribution that is used to derive the copula. Then, wecompare how close the two measurements of mutualinformation are. Eventually, the copula whose copuladensity based mutual information is closest to its corre-sponding bivariate distribution based mutual informationis determined as the optimal one to be selected.

For example, the bivariate Gaussian distribution is used toderive the Gaussian copula, and thus the Gaussian copula isable to accurately model the dependence between Gaussiandistributed dataset. As a result, for Gaussian distributeddataset, if we examine the optimal copula between Gaussianand Rayleigh copula, the difference between the Gaussiancopula based mutual information and bivariate Gaussian

0 5 100

2

4

6

8

10

12Gaussian copula

1

1

0 5 100

2

4

6

8

10

12Frank copula

1

1

original

simulated

Fig. 2. For Gaussian distributed datasets, simulated results using Gaussian (top-copulas.

distribution based mutual information should be smallerthan the difference between Rayleigh copula based mutualinformation and bivariate Rayleigh distribution based mutualinformation. This difference can be simply expressed by theabsolute value of their mathematical difference.

As for the computation of mutual information, anexample using Rayleigh copula is presented below. For agiven dataset, we need determine the Rayleigh copulabased mutual information and bivariate Rayleigh distribu-tion based mutual information to calculate their differ-ence. We firstly estimate Rayleigh copula parameter ρC ,and use it to compute the copula based mutual informa-tion. Next, the parameters (ρXY , ΩX and ΩY) for bivariateRayleigh distribution are estimated and used to decide thebivariate Rayleigh distribution based mutual information.All the copula density based mutual information, and mostbivariate distribution based mutual information can becomputed using numerical integrals [17]. This is simplybecause there are no analytical expressions for easilycalculating them, except the bivariate Gaussian distribu-tion based mutual information as it can be determinedusing Eq. (22).

It is worth noting that the proposed method for optimalcopula selection does not require any theoretic results, asthey are usually unavailable in practical applications due tothe fact that their marginal distribution may belongs tothe different distribution families. For a series of copulacandidates, only the bivariate distributions that are used toderive these copulas are required. This criterion canbe validated by the following experiment. We will first

0 5 100

2

4

6

8

0

2Clayton copula

0 5 100

2

4

6

8

0

2Rayleigh copula

left), Clayton (top-right), Frank (bottom-left) and Rayleigh (bottom-right)

Page 8: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

X. Zeng, et al. / Signal Processing 94 (2014) 681–690688

generate a series of correlated Rayleigh distributed data-sets with size 30,000�2, and parameters ΩX ¼ΩY ¼ 2under different power correlations varying from 0 to 1,and then we are going to determine that Rayleigh copula isthe optimal copula to validate our criterion introducedabove.

The computed results of optimal copula selection areshown in Table 2. From Table 2, we can find the AIC andBIC based methods consider correctly Rayleigh copula isthe optimal copula, however, they also consider Gaussiancopula is ‘better’ than Nakagami-m and Rician copulas forall the 10 bivariate Rayleigh distributed dataset withdifferent power correlation ρ. As a contrast, the proposedmutual information based method provides more reason-able results, since it considers the Rayleigh, Nakagami-mand Rician copula are ‘better’ than Gaussian copula for allthe 10 bivariate Rayleigh distributed dataset with differentpower correlation ρ. Note that Rayleigh copula can beconsidered as a special case of Nakagami-m copula withm¼1, and also a special case of Rician copula with z¼0,and therefore Rayleigh, Nakagami-m and Rician copulasshould be ‘better’ than Gaussian copula for the Rayleighdistributed datasets. We should also note that optimalcopula selection is important but difficult due to the actualdata generation mechanism is usually unknown for a givendataset. It is possible that several candidate copulas fit thedataset well or that none of the candidate of copula fit thedata well.

4. Case studies and practical applications

The distributions of the derived copulas such as Ray-leigh, Weibull, Nakagami-m and Rician distribution arewidely used for signal processing and communicationsapplications [18]. In this section, we will apply them toselection diversity combining system. Selection diversitycombining is a simple but widely used diversity techniquein communications. The output of a dual branch selectiondiversity combiner is given by w¼maxðx; yÞ. The cdf of wcan be obtained by the joint cdf of ðx; yÞ as [19]:

FðwÞ ¼ FXY ðw;wÞ ð24ÞIn the following, we will generate some correlated

random variables by using exponential/Rayleigh/Weibull,

Table 2Computed results of optimal copula selection for Rayleigh distributed dataset.

ρ 0.100 0.184 0.269 0.353Gaussian difference 0.004 0.0015 0.0037 0.0056Rayleigh difference 0.0001 0.0002 0.0002 0.0030Nakagami-m difference 0.0001 0.0006 0.0009 0.0017Rician difference 0.0002 0.0005 0.0006 0.0013AIC of Gaussian copula �7.6400 �9.7482 �11.313 �12.500 �AIC of Rayleigh copula �8.1032 �10.207 �11.7329 �12.8134 �AIC of Nakagami-m copula �6.1033 �8.2080 �9.7345 �10.814 �AIC of Rician copula �6.1032 �8.2074 �9.7329 �10.8140 �BIC of Gaussian copula 0.6689 �1.4392 �3.0040 �4.1908BIC of Rayleigh copula 0.2057 �1.8984 �3.4240 �4.5044BIC of Nakagami-m copula 10.5146 8.4099 6.8834 5.8044BIC of Rician copula 10.5147 8.4106 6.8850 5.8039

Note: Gaussian, Rayleigh, Nakagami-m and Rician difference denote the differebased mutual information and bivariate Gaussian, Rayleigh, Nakagami-m and R

Nakagami-m and Rician copulas and their correspondingmarginal distributions, and then compute their outageprobability for selection diversity systems. Note that inthe practical cases, the marginal distribution may bearbitrary, thus the optimal copula must be determinedbefore it is applied for the following on analysis tasks.

4.1. Simulation

Basically, Rayleigh copula-alike random variablesmeans the marginal distributions can be correctly mod-eled by Rayleigh copula, where for Nakagami-m copula-alike and Rician copula-alike random variables, the datasimulated can be more accurately modeled using thecorresponding Nakagami-m and Rician copulas, respec-tively. For simulation, we firstly generate 105 pairs ofcorresponding copula variables ðu; vÞ, where the para-meters for Rayleigh, Nakagami-m and Rician copulas are(ρ¼0.6), (ρ¼0.6, m¼2) and (ρ¼0.7, z¼2), respectively.Accordingly, this ensues the dependence betweenthe marginal distributions are Rayleigh copula-alike,Nakagami-m copula-alike and Rician copula-alike, respec-tively, though the actual marginal distribution can bearbitrary. Note that in practical applications, if the copulais found as the optimal one, this dataset will be consideredas corresponding copula-alike dataset.

After generating random variables for each copula, wetransform copula variables (u, v) to random variables (x, y)according to all other different marginal distributionsincluding Rayleigh, Weibull, Nakagami-m and Rician dis-tributions. Let x¼ F�1

X ðuÞ, y¼ F�1Y ðvÞ, w¼max(x, y),

uw ¼ FXðwÞ and vw ¼ FY ðwÞ, where FXð Þ and FY ð Þ representthe cdf of x and y, respectively. Then, the selectiondiversity outage probability for Rayleigh copula-like data-sets are determined by

FXY ðw;wÞ ¼ 1þ eρa2�a2

Z ρa2

0e�sI0ð2

ffiffiffiffiffiffiffia1s

p Þ ds�1� �

�e�a1

Z a2

0e�sI0ð2

ffiffiffiffiffiffiffiffiffiffiρa1s

p Þ ds ð25Þ

where a1 ¼�lnð1�uwÞð1�ρÞ�1 and a2 ¼�lnð1�vwÞð1�ρÞ�1.

0.438 0.522 0.607 0.691 0.776 0.860.0091 0.0127 0.0203 0.0277 0.0383 0.05670.0024 0.0020 0.0025 0.0053 0.0020 0.00160.0011 0.0016 0.0034 0.0046 0.0043 0.00090.0026 0.0024 0.0034 0.0052 0.0041 0.0316

13.515 �14.416 �15.155 �15.886 �16.610 �17.44113.8018 �14.6224 �15.3583 �16.0639 �16.7672 �17.579411.802 �12.623 �13.358 �14.064 �14.767 �15.54111.8018 �12.6224 �13.3583 �14.0639 �14.7672 �15.5794�5.2058 �6.1074 �6.8457 �7.5766 �8.3010 �9.1323�5.4929 �6.3134 �7.0493 �7.7549 �8.4582 �9.27054.8159 3.9945 3.2597 2.5540 1.8507 1.07724.8161 3.9955 3.2597 2.5540 1.8507 1.0385

nce between Gaussian copula, Rayleigh, Nakagami-m and Rician copulaician distribution based mutual information, respectively.

Page 9: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

-15 -10 -5 0 5 10 15 2010-6

10-5

10-4

10-3

10-2

10-1

100

Normalized Input dB

Out

age

Pro

babi

lity

Fig. 3. Selection diversity combining outage probability for Rayleighcopula-like datasets with different types of marginal distributions.

-10 -5 0 5 10 15 20 2510-5

10-4

10-3

10-2

10-1

100

Normalilzed Input dB

Out

age

Pro

babi

lity

Fig. 4. Selection diversity combining outage probability for Nakagami-mcopula-like datasets with different types of marginal distributions.

X. Zeng, et al. / Signal Processing 94 (2014) 681–690 689

Alternatively it can also be computed by Marcum’s Qfunction as

FXY ðw;wÞ ¼ 1þ ð1�vwÞQ1ðffiffiffiffiffiu′1

q;ffiffiffiffiffiffiffiffiρv′1

�ð1�uwÞð1�Q1ðffiffiffiffiffiffiffiffiρu′

1

q;ffiffiffiffiffiv′1

qÞÞ ð26Þ

where u′1 ¼ 2lnð1�uwÞðρ�1Þ�1 and v′1 ¼ 2lnð1�vwÞðρ�1Þ�1.

For Nakagami-m copula-like datasets, the selectiondiversity outage probability is defined as

FXY ðw;wÞ ¼ ð1�ρÞmΓðmÞ ∑

1

k ¼ 0

ρkΓðmþ kÞ2k!

� �P mþ k;

P�1ðm;uwÞ1�ρ

" #

�P mþ k;P�1ðm; vwÞ

1�ρ

" #ð27Þ

In addition, the selection diversity outage probabilityfor Rician copula-like datasets is determined by

FXY ðw;wÞ ¼Z b1

0

Z b2

0

z1z21�ρ2

exp � z21 þ z22 þ 2ð1�ρÞz22ð1�ρ2Þ

� �

� ∑þ1

k ¼ 0εkIk

ρz1z21�ρ2

� �Ik

z1z1þ ρ

� �Ik

z2z1þ ρ

� �dz1 dz2

ð28Þwhere b1 ¼Q�1

1 ðz;1�uwÞ and b2 ¼ Q�11 ðz;1�vwÞ.

Alternatively, this can also be computed by infiniteseries as

FXY ðw;wÞ ¼ exp�z2

1þ ρ

� �∑1

k ¼ 0εkð�1Þk

� ∑1

l; m; n ¼ 0

ð1þ ρÞ1�l′ð1�ρÞ1þl′ρ2mþkz2l′

2l0 l!m!n!ðlþ kÞ!ðmþ kÞ!ðnþ kÞ!

�γ lþmþ kþ 1;½Q�1

1 ðz;1�uÞ�22ð1�ρ2Þ

!

�γ nþmþ kþ 1;½Q�1

1 ðz;1�vÞ�22ð1�ρ2Þ

!ð29Þ

where l′¼ lþ nþ k.

4.2. Results and discussions

In the case of Rayleigh copula, the first marginal cdfFXð Þ is always Rayleigh distribution with parameter Ω¼ 2as shown in Fig. 3. For Nakagmai-m copula case, the firstmarginal cdf FXð Þ is always Nakagami-m distribution withparameters Ω¼ 5 and m¼2, and the obtained results aregiven in Fig. 4. For Rician copula application, the firstmarginal cdf FXð Þ is always Rician distribution with para-meters s¼ 2 and a¼4, and the computed results are givenin Fig. 5. In other words, these have demonstrated that theoptimal copulas have been successfully determined usingthe proposed approach.

In Figs. 3–5, the first marginal distribution is fixed asthe Rayleigh distribution, Nakagmai-m distribution andRician distribution, respectively, where the second mar-ginal distribution has a set of four values including theRayleigh, Weibull, Nakagami-m and Rician distributions.Although Ω¼ 2 is used for the Rayleigh distribution of thesecond marginal distribution in all three figures, differentparameters are used for other marginal distributions. For

Weibull, Nakagami-m and Rician distributions, the para-meters used are (Ω¼ 2, β¼ 5), (Ω¼ 5, m¼ 2) and (a¼ 2,s¼ 5) in Fig. 3, (Ω¼ 2, β¼ 5), (Ω¼ 2, m¼3) and (a¼ 2,s¼ 5) in Fig. 4, and (Ω¼ 6, β¼ 3), (Ω¼ 6, m¼2) and (a¼2,s¼ 2) in Fig. 5, respectively.

To the best of our knowledge, without the copula, theoutage probability can be determined only if the marginaldistributions belong to the same family. In other words,take Fig. 3 for example, only Rayleigh–Rayleigh can beobtained, and the results of Rayleigh–Weibull, Rayleigh–Nakagami-m and Rayleigh–Rician cannot be simulatedwhen copulas are absent. Moreover, the datasets have aspecific dependence structure that can be modeled byusing the corresponding Rayleigh copula, and this is whytheir marginal distributions that belong to differentfamilies of probability distributions cannot be generated

Page 10: Copulas for statistical signal processing (Part II) Simulation, … · 2017-05-09 · Copulas for statistical signal processing (Part II): Simulation, optimal selection and practical

-5 0 5 10 15 2010-4

10-3

10-2

10-1

100

Fig. 5. Selection diversity combining outage probability for Riciancopula-like datasets with different types of marginal distributions.

X. Zeng, et al. / Signal Processing 94 (2014) 681–690690

without using the Rayleigh copula. Similarly, this is alsoapplicable for the Nakagmai-m copula in Fig. 4 and theRician copula in Fig. 5.

5. Conclusions

In this paper, three parts of innovative work have beenreported in terms of simulation, optimal selection and casestudies of copula functions. Firstly, algorithms are pro-posed for generating random variables for Rician, Naka-gami-m, and exponential/Rayleigh/Weibull copulas as nosuch algorithms exist so far. For all the three groups ofcopulas, complex Gaussian distribution based method isintroduced to generate associated random variables. Forthe last two groups, how direct conditional cdf method canbe applied in generating associated random variables isalso given. Secondly, a novel method for selecting theoptimal copula by using mutual information is presented.This method has been fully validated by experiments, andis found suitable for all the copulas as long as their copuladensity functions and their corresponding joint pdfs thatare used to derive these copulas are available. Thirdly, casestudies are discussed to apply these copulas for outrageprobability estimation of selection diversity combiningsystems in communication based applications. Withoutusing the corresponding copulas, the simulated datasetsare required to have the same marginal distributions, yetthe copula functions have helped to loosen such con-straints in generating arbitrary marginal distributions.

Acknowledgments

Partial work of this paper is funded by two NationalScience Foundation of China (NSFC) projects (61003201and 61202165) and a joint project funded by Royal Societyof Edinburgh and NSFC (61211130125).

References

[1] R.B. Nelsen, An Introduction to Copulas, 2nd ed. Springer Verlag,New York, 2006.

[2] D. Huard, G. Evin, A. Favre, Bayesian Copula Selection, ElsevierComputational Statistics & Data Analysis 51 (2) (2005) 809–822.

[3] H. Manner, Modeling Asymmetric and Time-Varying Dependence,Maastricht University, 2010 (Ph.D. thesis).

[4] U. Cherubini, E. Luciano, W. Vecchiato, Copula Methods in Finance,John Wiley & Sons, New York, 2004.

[5] Y. Chen, C. Tellambura, Distributions functions of selection combineroutput in equally correlated Rayleigh, Rician, and Nakagami-mfading channels, IEEE Transactions on Communications 52 (11)(2004) 1948–1956.

[6] X. Liu, Copula of bivariate Nakagami-m distribution, ElectronicsLetters 47 (5) (2011) 343–345.

[7] C. Tellambura, A.D.S. Jayalath, Generation of Rayleigh and Nakagami-m fading envelopes, IEEE Communications Letters 4 (5) (2000)170–172.

[8] C. Genest, B. R´emillard, Tests of independence and randomnessbased on the empirical copula process, Test 13 (2) (2004) 335–370.

[9] C. Genest, J.C. Boies, Detecting dependence with Kendall plots,American Statistician 57 (4) (2003) 275–284.

[10] H. Akaike, A new look at the statistical model identification, IEEETransactions on Automatic Control 19 (6) (1974) 716–723.

[11] A. Sundaresan, P.K. Varshney, Location estimation of a random signalsource based on correlated sensor observations, IEEE Transactionson Signal Processing 59 (2) (2011) 787–799.

[12] J.P.W. Pluim, J.B.A. Maintz, M.A. Viergever, Mutual information basedregistration of medical images: a survey, IEEE Transactions onMedical Imaging 22 (8) (2003) 986–1004.

[13] R. Bramon, I. Boada, A. Bardera, J. Rodriguez, M. Feixas, J. Puig,M. Sbert, Multimodal data fusion based on mutual information, IEEETransactions on Visualization and Computer Graphics 18 (9) (2012)1574–1587.

[14] A. Goldsmith, S.A. Jafar, N. Jindal, S. Vishwanath, Capacity limits ofMIMO channels, IEEE Journal on Selected Areas in Communications21 (5) (2003) 684–702.

[15] C.M. Thomas, T.A. Joy, Elements of Information Theory, John Wiley &Sons, Inc., 1991.

[16] X. Zeng, T.S. Durrani, Estimation of mutual information using copuladensity function, Electronics Letters 47 (8) (2011) 493–494.

[17] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, NumericalRecipes in C—The Art of Scientific Computing, 2nd ed. CambridgeUniversity, Cambridge, U.K., 1992.

[18] M.K. Simon, M.S. Alouini, Digital Communication over FadingChannels, 2nd ed. John Wiley & Sons, New York, 2005.

[19] A. Papoulis, Probability, Random Variables and Stochastic Processes,3rd ed. Mcgraw-Hill College, 1991.

[20] X. Zeng, J. Ren, Z. Wang, S. Marshall, T. Durrani, Copulas forstatistical signal processing (Part I): extensions and generalization,Signal Processing, http://dx.doi.org/10.1016/j.sigpro.2013.07.009, inpress.