23
ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio Punzo University of Catania Angelo Mazza University of Catania Paul D. McNicholas McMaster University Abstract We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under the common assumption of elliptically contoured groups. Thir- teen variants of the model are also implemented to introduce parsimony. The expectation- conditional maximization algorithm is adopted to obtain maximum likelihood parameter estimates, and likelihood-based model selection criteria are used to select the model and the number of groups. Parallel computation can be used on multicore PCs and com- puter clusters, when several models have to be fitted. Differently from the more popular mixtures of multivariate normal and t distributions, this approach also allows for auto- matic detection of mild outliers via the maximum a posteriori probabilities procedure. To exemplify the use of the package, applications to artificial and real data are presented. Keywords : mixture models, EM algorithm, contaminated normal distribution, outlier detec- tion, robust clustering, robust estimates. 1. Introduction Finite mixtures of distributions are commonly used in statistical modelling as a powerful device for clustering and classification by often assuming that each mixture component rep- resents a cluster (or group or class) into the original data (see McLachlan and Basford 1988, Fraley and Raftery 1998, and ohning 2000). For continuous multivariate random variables, attention is commonly focused on mixtures of multivariate normal distributions because of their computational and theoretical convenience. However, real data are often “contaminated” by outliers (also referred to as bad points herein, in analogy with Aitkin and Wilson 1980) that affect the estimation of the component means and covariance matrices (see, e.g., Barnett and Lewis 1994, Becker and Gather 1999, Bock 2002, and Gallegos and Ritter 2009). When outliers are mild (see Ritter 2015, for details), they can be dealt with by using heavy- tailed, usually elliptically symmetric, multivariate distributions. Endowed with heavy tails, these distributions offer the flexibility needed for achieving mild outliers robustness, whereas the multivariate normal distribution, often used as the reference distribution for the good ob- servations, lacks sufficient fit; for a discussion about the concept of reference distribution, see Davies and Gather (1993) and Hennig (2002). In this context, the multivariate t distribution arXiv:1606.03766v1 [stat.CO] 12 Jun 2016

ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

ContaminatedMixt: An R Package for Fitting

Parsimonious Mixtures of Multivariate

Contaminated Normal Distributions

Antonio PunzoUniversity of Catania

Angelo MazzaUniversity of Catania

Paul D. McNicholasMcMaster University

Abstract

We introduce the R package ContaminatedMixt, conceived to disseminate the use ofmixtures of multivariate contaminated normal distributions as a tool for robust clusteringand classification under the common assumption of elliptically contoured groups. Thir-teen variants of the model are also implemented to introduce parsimony. The expectation-conditional maximization algorithm is adopted to obtain maximum likelihood parameterestimates, and likelihood-based model selection criteria are used to select the model andthe number of groups. Parallel computation can be used on multicore PCs and com-puter clusters, when several models have to be fitted. Differently from the more popularmixtures of multivariate normal and t distributions, this approach also allows for auto-matic detection of mild outliers via the maximum a posteriori probabilities procedure. Toexemplify the use of the package, applications to artificial and real data are presented.

Keywords: mixture models, EM algorithm, contaminated normal distribution, outlier detec-tion, robust clustering, robust estimates.

1. Introduction

Finite mixtures of distributions are commonly used in statistical modelling as a powerfuldevice for clustering and classification by often assuming that each mixture component rep-resents a cluster (or group or class) into the original data (see McLachlan and Basford 1988,Fraley and Raftery 1998, and Bohning 2000).

For continuous multivariate random variables, attention is commonly focused on mixtures ofmultivariate normal distributions because of their computational and theoretical convenience.However, real data are often “contaminated” by outliers (also referred to as bad points herein,in analogy with Aitkin and Wilson 1980) that affect the estimation of the component meansand covariance matrices (see, e.g., Barnett and Lewis 1994, Becker and Gather 1999, Bock2002, and Gallegos and Ritter 2009).

When outliers are mild (see Ritter 2015, for details), they can be dealt with by using heavy-tailed, usually elliptically symmetric, multivariate distributions. Endowed with heavy tails,these distributions offer the flexibility needed for achieving mild outliers robustness, whereasthe multivariate normal distribution, often used as the reference distribution for the good ob-servations, lacks sufficient fit; for a discussion about the concept of reference distribution, seeDavies and Gather (1993) and Hennig (2002). In this context, the multivariate t distribution

arX

iv:1

606.

0376

6v1

[st

at.C

O]

12

Jun

2016

Page 2: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

2 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

(see, e.g., Lange, Little, and Taylor 1989) and the heavy-tailed versions of the multivariatepower exponential distribution (see, e.g., Gomez-Villegas, Gomez-Sanchez-Manzano, Maın,and Navarro 2011) play a special role. When used as mixture components, these distribu-tions respectively yield mixtures of multivariate t distributions (McLachlan and Peel 1998and Peel and McLachlan 2000) and mixtures of multivariate power exponential distributions(Zhang and Liang 2010). Although these methods robustify the estimation of the componentmeans and covariance matrices with respect to mixtures of multivariate normal distributions,they do not allow for automatic detection of bad points. To overcome this problem, Punzoand McNicholas (2016) introduce mixtures of multivariate contaminated normal distributions.The multivariate contaminated normal distribution, which dates back to the seminal workof Tukey (1960), is a further common and simple elliptically symmetric generalization of themultivariate normal distribution having heavier tails for the occurrence of bad points; it isa two-component normal mixture in which one of the components, with a large prior proba-bility, represents the good observations (reference distribution), and the other, with a smallprior probability, the same mean, and an inflated covariance matrix, represents the bad obser-vations (Aitkin and Wilson 1980). For further recent uses of this distribution in model-basedclustering, see Punzo and McNicholas (2014a,b), Punzo and Maruotti (2016), and Maruottiand Punzo (2016).

In this paper we present the R (R Core Team 2015) package ContaminatedMixt, availablefrom CRAN at https://cran.r-project.org/web/package=ContaminatedMixt, which al-lows for model-based clustering and classification by means of a family, proposed by Punzoand McNicholas (2016), of fourteen parsimonious variants of mixtures of multivariate con-taminated normal distributions. Parsimony is attained by applying the eigen decompositionof the component scale matrices, in the fashion of Banfield and Raftery (1993). Fitting isperformed via the expectation-conditional maximization (ECM) algorithm (Meng and Rubin1993) and likelihood-based model selection criteria are adopted to select both the number ofmixture components and the parsimonious model.

Several CRAN packages are available supporting model-based clustering and classification viamixtures of elliptically contoured distributions. A list of them may be found in the task view“Cluster Analysis & Finite Mixture Models” of Leisch and Grun (2015). One of the mostflexible packages for clustering via mixtures of multivariate normal distributions is mclust(Fraley and Raftery 2007 and Fraley, Raftery, Scrucca, Murphy, and Fop 2015); it providesten of the fourteen parsimonious mixtures of multivariate normal distributions of Celeux andGovaert (1995), obtained via a slightly different eigen-decomposition of the component co-variance matrices with respect to Banfield and Raftery (1993), implements an EM algorithmfor model fitting, and uses the Bayesian information criterion (BIC, Schwarz 1978) to de-termine the number of components. The package Rmixmod (Lebret, Iovleff, Langrognet,Biernacki, Celeux, and Govaert 2015) further fits the remaining four parsimonious modelsof Celeux and Govaert (1995). The package mixture (Browne, ElSherbiny, and McNicholas2015) allows to fit the family of fourteen parsimonious models of Banfield and Raftery (1993).Mixtures of multivariate normal distributions, with alternative parsimonious covariance struc-tures, are also implemented by the packages bgmm (Biecek, Szczurek, Vingron, and Tiuryn2012) and pgmm (McNicholas, ElSherbiny, Jampani, McDaid, Murphy, and Banks 2015).The teigen package (Andrews, Wickins, Boers, and McNicholas 2015) allows to fit a fam-ily of fourteen parsimonious mixtures of multivariate t-distributions (with eigen-decomposedcomponent scale matrices as in Celeux and Govaert 1995) from a clustering or classification

Page 3: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 3

point of view (see Andrews, McNicholas, and Subedi 2011 and Andrews and McNicholas2012 for details). Finally, although not available on CRAN, the MPE package, availableat http://onlinelibrary.wiley.com/doi/10.1111/biom.12351/suppinfo, allows to fit afamily, introduced by Dang, Browne, and McNicholas (2015), of eight parsimonious variants ofmixtures of multivariate power exponential distributions (with eigen-decomposed componentscale matrices as in Celeux and Govaert 1995).

The paper is organized as follows. Section 2 retraces the models implemented in the Contami-natedMixt package, Section 3 outlines the ECM algorithm for maximum likelihood parametersestimation, and Section 4 illustrates some further computational/practical aspects. The rel-evance of the package is shown, via real and artificial data sets, in Section 5, and conclusionsare finally given in Section 6.

2. Methodology

2.1. The general model

For a random vector X, taking values in IRp, a finite mixture of multivariate contaminatednormal distributions (Punzo and McNicholas 2016) can be written as

p (x;ψ) =G∑

g=1

πg[αgφ

(x;µg,Σg

)+ (1− αg)φ

(x;µg, ηgΣg

)], (1)

where, for the gth component, πg is its mixing proportion, with πg > 0 and∑G

g=1 πg =1, αg ∈ (0, 1) is the proportion of good observations, and ηg > 1 denotes the degree ofcontamination. In (1), ψ contains all of the parameters of the mixture while φ (x;µ,Σ)represents the distribution of a p-variate normal random vector with mean µ and covariancematrix Σ. As a special case, when αg → 1− and ηg → 1+, for each g = 1, . . . , G, we obtainclassical mixtures of multivariate normal distributions.

2.2. Parsimonious variants of the general model

Because there are p (p+ 1) /2 free parameters for each component scale matrix Σg, it is usuallynecessary to introduce parsimony in model (1). Following Banfield and Raftery (1993), Punzoand McNicholas (2016) consider the eigen decomposition

Σg = λgΓg∆gΓ>g , (2)

where λg is the first (largest) eigenvalue of Σg, ∆g is the diagonal matrix of the scaled (withrespect to λg) eigenvalues of Σg sorted in decreasing order, and Γg is a p × p orthogonalmatrix whose columns are the normalized eigenvectors of Σg, ordered according to theireigenvalues. Each element in the right-hand side of (2) has a different geometric interpretation:λg determines the size of the cluster, ∆g its shape, and Γg its orientation.

Following Banfield and Raftery (1993), Punzo and McNicholas (2016) impose constraintson the three components of (2) resulting in a family of fourteen parsimonious mixtures ofmultivariate contaminated normal distributions (Table 1). Sufficient conditions for the iden-tifiability of the models in this family are given in Punzo and McNicholas (2016).

Page 4: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

4 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

Family Model Volume Shape Orientation Σg # of free parameters in Σg

Spherical EII Equal Spherical – λI 1

VII Variable Spherical – λgI G

Diagonal EEI Equal Equal Axis-Align λΓ p

VEI Variable Equal Axis-Align λgΓ G+ p− 1

EVI Equal Variable Axis-Align λΓg 1 +G (p− 1)

VVI Variable Variable Axis-Align λgΓg Gp

General EEE Equal Equal Equal λΓ∆Γ> p (p+ 1) /2

VEE Variable Equal Equal λgΓ∆Γ> G+ p− 1 + p (p− 1) /2

EVE Equal Variable Equal λΓ∆gΓ> 1 +G (p− 1) + p (p− 1) /2

EEV Equal Equal Variable λΓg∆Γ>g p+Gp (p− 1) /2

VVE Variable Variable Equal λgΓ∆gΓ> Gp+ p (p− 1) /2

VEV Variable Equal Variable λgΓg∆Γ>g G+ p− 1 +Gp (p− 1) /2

EVV Equal Variable Variable λΓg∆gΓ>g 1 +G (p− 1) +Gp (p− 1) /2

VVV Variable Variable Variable λgΓg∆gΓ>g Gp (p+ 1) /2

Table 1: Nomenclature, covariance structure, and number of free parameters in Σg foreach member of the family of parsimonious mixtures of multivariate contaminated normaldistributions.

2.3. Modelling framework: model-based classification

Model-based classification is receiving renewed attention (see, e.g., Dean, Murphy, and Downey2006, McNicholas 2010, Andrews et al. 2011, Browne and McNicholas 2012, Punzo 2014, andSubedi, Punzo, Ingrassia, and McNicholas 2013, 2015). However, despite being the most gen-eral framework within which to present and analyze direct applications of mixture models, itremains the “poor cousin” of model-based clustering within the literature.

Consider the random sample {xi}ni=1 from (1). Without loss of generality, suppose that thefirst m observations are known to belong to one of G groups; these are the so-called labeledobservations. Let zi be the G-dimensional component-label vector in which the gth elementis zig = 1 if xi belongs to component g and zig = 0 otherwise, g = 1, . . . , G. If the ithobservation is labeled, denote with zi = (zi1, . . . , ziG) its component-membership indicator.In model-based classification, we use all n observations to estimate the parameters of themixture; the fitted model is adopted to classify each of the n − m unlabeled observationsthrough the corresponding maximum a posteriori (MAP) probability. Note that

MAP (zig) =

{1 if maxh{zih} occurs in component g,

0 otherwise.

Using this notation, the model-based classification likelihood can be written as

L (ψ) =

m∏i=1

G∏g=1

{πg[αgφ

(xi;µg,Σg

)+ (1− αg)φ

(xi;µg, ηgΣg

)]}zig n∏i=m+1

p (xi;ψ) .

We obtain the model-based clustering scenario as a special case when m = 0; this is thescenario used by Punzo and McNicholas (2016) to introduce the model.

Page 5: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 5

3. Maximum likelihood estimation

3.1. An ECM algorithm

To fit the models in Table 1, Punzo and McNicholas (2016) illustrate the expectation-conditional maximization (ECM) algorithm of Meng and Rubin (1993). The ECM algorithmis a variant of the classical expectation-maximization (EM) algorithm (Dempster, Laird, andRubin 1977), which is a natural approach for maximum likelihood estimation when data areincomplete. In our case, there are two sources of missing data: one arises from the fact thatwe do not know the component labels {zi}ni=m+1 and the other arises from the fact thatwe do not know whether an observation in group g is good or bad. To denote this secondsource of missing data, we use {vi}ni=1, with vi = (vi1, . . . , viG), where vig = 1 if observa-tion i in group g is good and vig = 0 if observation i in group g is bad. By working on thecomplete-data likelihood

Lc (ψ) =

m∏i=1

G∏g=1

{πg[αgφ

(xi;µg,Σg

)]vig [(1− αg)φ(xi;µg, ηgΣg

)](1−vig)}zig

×n∏

i=m+1

G∏g=1

{πg[αgφ

(xi;µg,Σg

)]vig [(1− αg)φ(xi;µg, ηgΣg

)](1−vig)}zig,(3)

the ECM algorithm iterates between three steps — an E-step and two CM-steps — until con-vergence (which is evaluated via the Aitken acceleration criterion; see Aitken 1926 and Lindsay1995). The only difference from the EM algorithm is that each M-step is replaced by two sim-

pler CM-steps. They arise from the partition ψ = {ψ1,ψ2}, where ψ1 ={πg, αg,µg,Σg

}Gg=1

and ψ2 = {ηg}Gg=1. In particular, for the most general model VVV, the (r + 1)th iterationof the ECM algorithm can be summarized/simplified as follows (see Punzo and McNicholas2016, for details on the model-based clustering paradigm):

E-step: The values of zig and vig in (3) are respectively replaced by

z(r)ig =

π(r)g

[α(r)g φ

(x;µ

(r)g ,Σ

(r)g

)+(

1− α(r)g

)φ(x;µ

(r)g , η

(r)g Σ

(r)g

)]p(x;ψ(r)

)and

v(r)ig =

α(r)g φ

(x;µ

(r)g ,Σ

(r)g

)α(r)g φ

(x;µ

(r)g ,Σ

(r)g

)+(

1− α(r)g

)φ(x;µ

(r)g , η

(r)g Σ

(r)g

) ;

CM-step 1: Fixed ψ2 = ψ(r)2 , the parameters in ψ1 are updated as

π(r+1)g =

n(r)g

n,

α(r+1)g =

1

n(r)g

(m∑i=1

zigv(r)ig +

n∑i=m+1

z(r)ig v

(r)ig

), (4)

Page 6: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

6 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

µ(r+1)g =

1

s(r)g

[m∑i=1

zig

(v(r)ig +

1− v(r)ig

η(r)g

)xi +

n∑i=m+1

z(r)ig

(v(r)ig +

1− v(r)ig

η(r)g

)xi

], (5)

and

Σ(r+1)g =

1

n(r)g

W (r)g ,

where

n(r)g =m∑i=1

zig +n∑

i=m+1

z(r)ig ,

s(r)g =m∑i=1

zig

(v(r)ig +

1− v(r)ig

η(r)g

)+

n∑i=m+1

z(r)ig

(v(r)ig +

1− v(r)ig

η(r)g

),

and

W (r+1)g =

m∑i=1

zig

(v(r)ig +

1− v(r)ig

η(r)g

)(xi − µ(r+1)

g

)(xi − µ(r+1)

g

)>+

n∑i=m+1

z(r)ig

(v(r)ig +

1− v(r)ig

η(r)g

)(xi − µ(r+1)

g

)(xi − µ(r+1)

g

)>. (6)

CM-step 2: Fixed ψ1 = ψ(r+1)1 , the parameters in ψ2 are updated by maximizing the

function

−p2

m∑i=1

zig

(1− v(r)ig

)ln ηg −

p

2

n∑i=m+1

z(r)ig

(1− v(r)ig

)ln ηg+

−1

2

m∑i=1

zig1− v(r)ig

ηgδ(xi,µ

(r+1)g ; Σ(r+1)

g

)− 1

2

n∑i=m+1

z(r)ig

1− v(r)ig

ηgδ(xi,µ

(r+1)g ; Σ(r+1)

g

),

(7)

where δ(xi,µ

(r+1)g ; Σ

(r+1)g

)denotes the squared Mahalanobis distance between xi and

µ(r+1)g (with covariance matrix Σ

(r+1)g ), with respect to ηg, under the constraint ηg > 1,

for g = 1, . . . , G. Operationally, the optimize() function, in the stats package, is used

to perform a numerical search of the maximum η(r+1)g of (7) over the interval

(1, η∗g

),

with η∗g > 1.

As it is well-documented in Punzo and McNicholas (2016), the weights(v(r)ig +

1− v(r)ig

η(r)g

)

in (5) and (6) reduce the impact of bad points in the estimation of the component means µg

and the component scale matrices Σg, thereby providing robust estimates of these parameters.

Page 7: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 7

For a discussion on down-weighting for the multivariate contaminated normal distribution,see also Little (1988).

The ECM algorithm for the other parsimonious models changes only with respect to theway the terms of the decomposition of Σg are obtained in the first CM-step. In particular,these updates are analogous to those given by Celeux and Govaert (1995) for their normalparsimonious clustering (GPC) models (corresponding to mixtures of multivariate normaldistributions with eigen decomposed covariance matrices). The only difference is that, on the

(r + 1)th iteration of the algorithm, W(r+1)g is used instead of the classical scattering matrix

m∑i=1

zig

(xi − µ(r+1)

g

)(xi − µ(r+1)

g

)>+

n∑i=m+1

z(r)ig

(xi − µ(r+1)

g

)(xi − µ(r+1)

g

)>.

4. Further aspects

4.1. Initialization

Many initialization strategies have been proposed for the EM algorithm applied to mixturemodels (see, e.g., Biernacki, Celeux, and Govaert 2003, Karlis and Xekalaki 2003, and Bagnatoand Punzo 2013). The ContaminatedMixt package implements the following initializations,

all based on providing the initial quantities z(0)i , v

(0)i , and η

(0)g = 1.001, i = 1, . . . , n and

g = 1, . . . , G, to the first CM-step of the ECM algorithm.

"random.soft": each z(0)i is substituted by a single observation randomly generated — via

the rmultinom() function of the stats package — from a multinomial distribution with

probabilities (1/G, . . . , 1/G). The values v(0)ig , i = 1, . . . , n and g = 1, . . . , G, are, by

default, fixed to one, but they can be also provided by the user.

"random.hard": the G values in z(0)i are randomly generated by a uniform distribution —

via the runif() function of the stats package — and then normalized in order to sum

to 1. The values v(0)ig , i = 1, . . . , n and g = 1, . . . , G, are, by default, fixed to one, but

they can be also provided by the user.

"kmeans": hard values for z(0)i , i = 1, . . . , n, are provided by a preliminary run of the k-means

algorithm, as implemented by the kmeans() function of the stats package.

"mixt": For each parsimonious model, the n values z(0)i are substituted with the posterior

probabilities arising from the fitting of the corresponding parsimonious mixture of mul-tivariate normal distributions; the latter is estimated by the gpcm() function of the

mixture package. The values v(0)ig , i = 1, . . . , n and g = 1, . . . , G, are fixed to one; from

an operational point of view, thanks to the monotonicity property of the ECM algo-rithm (see, e.g., McLachlan and Krishnan 2007, p. 33), this also guarantees that thefinal observed-data log-likelihood of the parsimonious mixture of multivariate contami-nated normal distributions will be always greater than, or equal to, the observed-datalog-likelihood of the corresponding parsimonious mixture of multivariate normal dis-tributions. This is a fundamental consideration for the use of likelihood-based modelselection criteria for choosing between these two models.

Page 8: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

8 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

"manual": the (soft or hard) values of z(0)i , as well as the values of v

(0)i , are provided by the

user.

4.2. Automatic detection of bad points

For a mixture of multivariate contaminated normal distributions, the classification of anobservation xi means:

Step 1. to determine its cluster of membership;

Step 2. to establish if it is either a good or a bad observation in that cluster.

Let zi and vi denote, respectively, the expected values of zi and vi arising from the ECM

algorithm, i.e., zig is the value of z(r)ig at convergence and vig is the value of v

(r)ig at convergence.

To determine the cluster of membership of xi, we use the MAP classification, i.e., MAP (zig).We then consider vih, where h is selected such that MAP (zih) = 1, while xi is consideredgood if vih > 0.5 and xi is considered bad otherwise. The resulting information can be usedto eliminate the bad points, if such an outcome is desired (Berkane and Bentler 1988). Theremaining data may then be treated as effectively being distributed according to a mixtureof multivariate normal distributions, and the clustering results can be reported as usual.

4.3. Constraints for detection of bad points

It may be required that in the gth cluster, g = 1, . . . , G, the proportion of good data is atleast equal to a pre-determined value α∗g. In this case, the optimize() function is also used

for a numerical search of the maximum α(r+1)g , over the interval

(α∗g, 1

), of the function

m∑i=1

z(r)ig

[v(r)ig lnαg +

(1− v(r)ig

)ln (1− αg)

]+

n∑i=m+1

z(r)ig

[v(r)ig lnαg +

(1− v(r)ig

)ln (1− αg)

].

(8)Note that the ContaminatedMixt package also allows to fix αg a priori. This is somewhatanalogous to the trimmed clustering approach implemented by the tclust package (Fritz,Garcıa-Escudero, and Mayo-Iscar 2012), where one must specify the proportion of outliers(the so-called trimming proportion) in advance. However, pre-specifying the proportion ofbad points a priori may not be realistic in many practical scenarios.

4.4. Model selection criteria

Thus far, the number of components G and the covariance structure (cf. Table 1) have beentreated as a priori fixed. However, in most practical applications, they are unknown, so it iscommon practice to select them by evaluating a convenient (likelihood-based) model selectioncriterion over a reasonable range of possible options (for the alternative use of likelihood-ratiotests to select either the parsimonious model or the number of components for a normal mix-ture, see Punzo, Browne, and McNicholas 2016). The ContaminatedMixt package supports

the information criteria listed in Table 2, where l(ψ)

is the observed-data log-likelihood and

q is the number of free parameters.

Page 9: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 9

information criterion definition reference

AIC 2l(ψ)− 2q Akaike (1973)

AIC3 2l(ψ)− 3q Bozdogan (1994)

AICc AIC− 2q (q + 1)

n− q − 1Hurvich and Tsai (1989)

AICu AICc− n lnn

n− q − 1McQuarrie, Shumway, and Tsai (1997)

AWE 2l(ψ)− 2ρ

(3

2+ lnn

)Banfield and Raftery (1993)

BIC 2l(ψ)− q lnn Schwarz (1978)

CAIC 2l(ψ)− q (1 + lnn) Bozdogan (1987)

ICL BIC +

n∑i=m+1

G∑g=1

MAP (zig) ln zig Biernacki, Celeux, and Govaert (2000)

Table 2: Definition and key reference for the implemented model selection criteria.

5. Package description and illustrative example

In this section we provide a description of the main capabilities of the ContaminatedMixtpackage along with some illustrations.

5.1. Package description

The ContaminatedMixt package is developed in an object-oriented design, using the stan-dard S3 paradigm. Its main function, CNmixt(), fits the model(s) in Table 1 and returns aContaminatedMixt class object; the arguments of this function, along with their description,are listed in Table 3. In addition, the package contains several methods that allow for dataextraction, visualization, and plot.

Extractors for ContaminatedMixt class objects are illustrated in Table 4. When several mod-els have been fitted, extractor functions consider the best model according to the informationcriterion in criterion, within the subset of estimated models having a number of compo-nents among those in G and a parsimonious model among those in model. Note that getIC()and whichBest() have an argument criteria, in substitution to criterion, which allowsto select more than one criterion.

The package also includes some methods for ContaminatedMixt class objects; they are:plot() and pairs(), to display clustering/classification results in terms of scatter plots (inthe cases p = 2 and p ≥ 2, respectively), summary(), to visualize the estimated parametersand further inferential/clustering details, print(), to print at video the selected model(s)according to the information criteria in Table 2, and agree() to evaluate the agreement ofa given partition with respect to the partition arising from a fitted model. As usual, furtherdetails can be found in the functions’ help pages.

Finally, for the multivariate contaminated normal distribution, the function dCN() gives theprobability density and the function rCN() generates random deviates.

Page 10: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

10 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

arguments description

X matrix or data frame of dimension n× p, with p > 1.

G vector containing the numbers of groups to be tried.

model vector indicating the models ("EII", "VII", "EEI", "VEI", "EVI","VVI", "EEE", "VEE", "EVE", "EEV", "VVE", "VEV", "EVV", "VVV") tobe used. If model = NULL (default), then all 14 models are fitted.

initialization initialization strategy for the ECM algorithm. Possible values are"random.soft", "random.hard", "kmeans", "mixt", and "manual"

(see Section 4.1 for details). Default is initialization = "mixt".

alphafix vector, of dimension G, with fixed a priori values for α1, . . . , αG. If thelength of alphafix is different from G, its first element is replicated Gtimes. If alphafix = NULL (default), α1, . . . , αG are estimated.

alphamin vector with values α∗1, . . . , α∗G (see Section 4.3). If the length of

alphamin is different from G, its first element is replicated G times.If alphamin = NULL, α1, . . . , αG are estimated without constraints, asin (4). Default value is 0.5.

etafix vector, of dimension G, with fixed a priori values for η1, . . . , ηG. If thelength of etafix is different from G, its first element is replicated Gtimes. If etafix = NULL (default), η1, . . . , ηG are estimated.

etamax vector with values η∗1, . . . , η∗G (see the CM-step 2 of Section 3.1). If the

length of etamax is different from G, its first element is replicated Gtimes. Default value is 1000.

seed the seed for the random number generator, when random initializationsare used; if NULL (default), current seed is not changed.

start.z when initialization = "manual", it is a n × G matrix with values

z(0)ig .

start.v it is a n×G matrix with values v(0)ig . If start.v = NULL (default), then

v(0)ig = 1, i = 1, . . . , n and g = 1, . . . , G.

ind.label vector of positions (rows) of the labeled observations.

label vector, of the same dimension of ind.label, with the groups member-ship of the observations indicated by ind.label.

iter.max maximum number of iterations in the ECM algorithm. Default is 1000.

threshold value of the stopping rule for Aitken’s acceleration procedure. Defaultis 1.0e-03.

eps smallest value for the eigenvalues of Σ1, . . . ,ΣG. It used to preventthe EM algorithm to be affected by local maxima or degeneracy of thelikelihood (see Hathaway 1986 and Ingrassia 2004). Default value is1e-100.

parallel when TRUE, the package parallel is used for parallel computation. Thenumber of cores to use may be set with the global option cl.cores;default value is detected using detectCores().

Table 3: List of arguments for the function CNmixt().

Page 11: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 11

extractors description

getBestModel() a ContaminatedMixt class object containing the best model only.

getPosterior() estimated posterior probabilities zig, i = 1, . . . , n and g = 1, . . . , G.

getSize() estimated groups sizes (from the hard classification induced by the MAPoperator).

getCluster() classification vector.

getDetection() matrix with two columns: the first gives the MAP group membershipswhereas the second specifies if the observations are either good or bad(see Section 4.2)

getPar() estimated parameters (i.e., ψ).

getIC() values for the considered criteria in criteria.

whichBest() position of the model, in the ContaminatedMixt class object, for thecriteria specified in criteria.

Table 4: Extractors for ContaminatedMixt class objects.

5.2. Artificial data

To illustrate the use of the package, we begin with an artificial data set from a mixture ofG = 2 bivariate normal distributions, of equal size, with an EEI structure for the componentcovariance matrices. Twenty noise points are also added from a uniform distribution over therange -20 to 20 on each variable. The data are generated by the following commands

R> library("ContaminatedMixt")

R> library("mnormt")

R> p <- 2

R> set.seed(12345)

R> X1 <- rmnorm(n = 200, mean = rep(2, p), varcov = diag(c(5, 0.5)))

R> X2 <- rmnorm(n = 200, mean = rep(-2, p), varcov = diag(c(5, 0.5)))

R> noise <- matrix(runif(n = 40, min = -20, max = 20), nrow = 20, ncol = 2)

R> X <- rbind(X1, X2, noise)

The scatterplot of these data, in Figure 1, is obtained via the commands

R> group <- rep(c(1, 2, 3), times = c(200, 200, 20))

R> plot(X, col = group, pch = c(3, 4, 16)[group], asp = 1,

+ xlab = expression(X[1]), ylab = expression(X[2]))

Model-based clustering

We start with a model-based clustering analysis by considering all the fourteen models inTable 1 and a number G of clusters ranging from 1 to 4, resulting in 56 different models. Thefollowing command

R> res1 <- CNmixt(X, model = NULL, G = 1:4, parallel = TRUE)

Page 12: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

12 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

●●

−20 −10 0 10 20

−15

−10

−5

05

1015

X1

X2

Figure 1: Scatterplot of the artificial data of Section 5.2. Noise is represented by greenbullets.

With G = 1, some models are equivalent, so only one model from each set of

equivalent models will be run.

Using 8 cores

Best model according to AIC has G = 3 group(s) and parsimonious structure VVI

Best model according to AICc, AICu, AIC3, AWE, BIC, CAIC, ICL has G = 2

group(s) and parsimonious structure EEI

performs the ECM-fitting of the models and returns an object of class ContaminatedMixt.Because several models have to be fitted, parallel computation is convenient; it is set withthe argument parallel = TRUE. The number of CPU cores used is printed at video and itis followed, after a few seconds, by a description of the best model according to each of the8 criteria in Table 2. Here, we can note as all the criteria, apart from the AIC, agree insuggesting a model with G = 2 clusters and the true but unknown parsimonious structure

Page 13: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 13

EEI. To find out more about the model selected by the BIC, we run the command

R> summary(res1)

----------------------------------

Best fitted model according to BIC

----------------------------------

log.likelihood n par BIC

-1835.8 420 11 -3738

Clustering table:

1 2

209 211

Prior: group 1 = 0.4965, group 2 = 0.5035

Model: EEI (diagonal, equal volume and shape) with 2 components

Variables

Means:

group 1 group 2

X.1 -1.8564 2.3207

X.2 -1.9783 2.0697

Variance-covariance matrices:

Component 1

X.1 X.2

X.1 5.0324 0.00000

X.2 0.0000 0.51525

Component 2

X.1 X.2

X.1 5.0324 0.00000

X.2 0.0000 0.51525

Alpha

0.9506963 0.9485337

Eta

86.44625 99.15542

As we can note from the estimates η1 = 86.44625 and η2 = 99.15542, there is a large enoughdegree of contamination in the two clusters, which together contributes to capture the un-derlying noise (see also the estimates α1 = 0.9506963 and α2 = 0.9485337). In order toevaluate the agreement of the obtained clustering with respect to the true one, we can adoptthe agree() function, included in the package, in the following way

R> agree(res1, givgroup = group)

Page 14: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

14 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

groups

givgroup 1 2 bad points

1 0 200 0

2 200 0 0

3 0 2 18

Apart from two bad points which are erroneously attributed to group 1, the obtained classifi-cation is in agreement with the true one. Note that these misclassifications are not necessarilyan error: the way the noisy points are inserted into the data makes possible that some of themmay lay in the same range of good points and, as such, these points are detected as goodpoints by the model. A plot of the clustering results for the best BIC model is displayed withthe command (Figure 2)

R> plot(res1, contours = TRUE, asp = 1, xlab = expression(X[1]),

+ ylab = expression(X[2]))

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

● ●●●

●●

●●

●●

● ●

●●

● ●

●●●●

●● ●●

●●●

●●

●●● ●

●●

● ●● ●

● ●●●

●●

●●

●●

● ●●●

●●●

● ●

●●

●●

●●

● ●●

● ●

●●●

●●

●●●

●● ●

●● ●

●●

● ●●●

● ●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●● ●●

●●●

●●

●●

● ●●

●● ●

●●

●●●

●●●● ●

●●● ●●

●● ●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●●

● ●

●●

● ●●

●● ●●

●●

●●

● ●●

●●

● ●●● ●

●●

●●

●●●

●●

● ●●

●●

● ●●

●●●

●●

●●●

● ●

●● ●●●●●

●●

●●

●●●●●

● ●

● ●

●●

● ●●

●●

●● ●●

●●

● ●●

●●● ●●

●●●

●●

−20 −10 0 10 20

−15

−10

−5

05

1015

X1

X2

1e−04

0.0101

0.0101

0.0201

0.0201 *

****

***

* **

*

** *

* **

**

*

* ***

*** *

**

*

*

*

**

**

***

* ***

* **

*

* *****

*

*** *

**

**

**

* **

** *** **** ***** *

**** **

**

* ** ** ****

* ** * *

**

**

** ** **

* **** ** ***

**

***

* *** *

*

****

**

**

****

* **

* **

**

** *

* ** ** ****

* ***** ** ****

***

**

**

*

** **

*

* *** **

**

**** ** *

* ** ****

** *

*** *

***

**

*** **** **** **** ***

**

* **

* **

**

****

* ** **

**

*** *

* *

*

* ** **

**** *

***

* ***

* **

**

*** ** * **

*** ** *** ****

** **

**

* * **

**

****

* * **

**

***

*** *

** ** *

*** ****

***

***

*

*** ** ** *

* *

* ** ** **** * *

*** **

**** **

****

bad

bad

*

bad

bad

bad

bad

bad

bad

bad

bad

bad

bad

bad

bad

bad

badbad

*

bad

Figure 2: Clustering results from the model selected by the BIC on the artificial data ofSection 5.2. Isodensities of the model are superimposed on the plot.

Isodensities are also displayed (contours = TRUE). Results in Figure 2 look similar to thosein Figure 1.

Page 15: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 15

Model-based classification

On the same data, we can also suppose to know the cluster membership of some of the availableobservations and evaluate the classification of the remaining ones. Via the commands

R> indlab <- sample(1:400, 20)

R> lab <- group[indlab]

R> res2 <- CNmixt(X, G = 2, model = "EEI", ind.label = indlab, label = lab)

Estimating model EEI with G = 2:*********************************************

**********************************************

Estimated one model with G = 2 group(s) and parsimonious structure EEI

we firstly randomly select twenty good observations to be considered as labeled, and then wefit the EEI model (model = "EEI"), with G = 2 clusters, assuming the groups membership ofthese observations as known in advance. The position of the labeled observations is containedin the object indlab, while their group membership is given in the object lab. The agreementbetween the obtained classification and the true classification of the unlabelled observationsonly can be evaluated via the command

R> agree(res2, givgroup = group)

groups

1 2 bad points

1 0 193 1

2 186 0 0

3 0 0 20

Naturally, the comparison is automatically focused on the unlabelled observations. As wecan see, the results slightly change with respect to the clustering analysis and only one goodobservation from the first cluster is erroneously detected as bad.

5.3. The wine dataset

This second tutorial uses the wine data set included in the ContaminatedMixt packageand available at the UCI machine learning repository http://archive.ics.uci.edu/ml/

datasets/Wine. These data are the results of a chemical analysis of wines grown in the sameregion in Italy but derived from three different cultivars (Barbera, Barolo, and Grignolino).The analysis determined the quantities of p = 13 constituents (continuous variables) found ineach of the three types of wine. Data are loaded with

R> data("wine")

This command loads a data frame with the first column being a factor indicating the typeof wine and the others containing the measurements about the 13 constituents. The plot ofthese data, displayed in Figure 3, is obtained by

Page 16: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

16 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

R> group <- wine[, 1]

R> pairs(wine[, -1], cex = 0.6, pch = c(2, 3, 1)[group],

+ col = c(3, 4, 2)[group], gap = 0, cex.labels = 0.6)

Alcohol

1 3 5

●●●

●●

●●

●●

●●

●●

●●● ●●

●●

●● ●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●●

● ●

●●

●●

●●

●●●●●

●●

●●●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

10 20 30

● ●●

●●

● ●

●●

●●

●●

●●●● ●

●●

● ●●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●● ●●

●●

●●●●

●●

●●●

● ●

●●

●●

● ●

●●●

●●

1.0 3.0

●●●

●●

●●

●●

●●

●●

●●●●●

●●

● ●●●

●●

●●●

● ●

●●

●●

● ●

●●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●

●●

● ●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

0.2 0.6

● ●●

●●

● ●

●●

●●

●●

● ●● ●●

●●

●● ●●

●●

● ●●

●●

●●

●●

● ●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●

●●

●● ●●

●●

●●●

● ●

●●

●●

●●

●●

●●

2 8

● ●●

●●

●●

●●

●●

●●

●●●● ●

●●

● ●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●

●●

●● ●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

1.5 3.5

●●●

●●●

●●

●●

●●

●●

●●●● ●

●●

●●●●

●●

●●●

●●

●●

●●

● ●

●●

●●

1113

● ●●

●●

● ●

●●

●●

●●

●●●●●

●●

● ●●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

13

5

●●● ●●

●● ●● ●

●● ●●●

● ●

●●●

●●● ●

●●

●●

● ●

● ●●●

●●

●● ● ●

●●

●●

●●

Malic

● ●●● ●

●●● ● ●

●● ●● ●●

●●

● ●●

●●●●

●●

●●

●●

●●● ●

●●

●●● ●

●●

●●

●●

● ●●●●

●●● ●●

●● ●● ●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●●

●●

●●●●

● ●

●●

●●

● ●●●●●

●●● ●●

●●● ●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●● ●

●●

●●

●●

●●●● ●

●● ●● ●

●● ●● ●

●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●●●

● ●

●●

●●

● ●●● ●

●● ●●●

●● ●● ●

●●

●●●

●●

●●

●●

●●

●●

●●● ●

●●

●●● ●

●●

●●

●●

● ●●●●

●●●● ●

●●● ●●

● ●

●●●

●●

● ●

●●

●●

●●

● ●●●

●●

● ●●●

●●

●●

●●

●●●● ●

●●●●●

●● ●● ●

●●

●●●

●●●●

●●

●●

●●

●●● ●

●●

● ●●●

● ●

●●

●●

● ●●●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

● ●● ●●

●●●●●

●● ●● ●

● ●

●●●

●●●●

●●

●●

●●

●●●●

●●

● ●●●

● ●

●●

●●

●●● ● ●

●● ● ●●

●● ●● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●●●

●●

●● ●●

● ●

●●

●●

● ●● ●●

●●●● ●●

●●● ●●

●●

●●●

●●●●

●●

●●

● ●

●●● ●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

●●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●● ●●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

● ● Ash

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ● ●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●● ● ●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●● ●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●● ●● ●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●● ●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

1.5

3.0

●●

● ●

●●

●●

●●

●●

●●

●●●●●● ●

●●

●●

●●

●●●

●●

●●

●● ●

●●

● ●

1020

30

●●●

●●●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●●● ●

●●●●●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

● ●●●

●●●

●● ●

●●

●●●●●

●●

●●

●●●●

●●●

● ●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

● ●●●

●Alcalinity

●●●

● ●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●● ●●

●●●

●● ●

●●

●●●●

●●

●●

●●●●

●●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●●

●●●●

●●

●●

●● ●

●●

●●

●●●

●●

●●

●●●●

●●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●● ●

●●●

● ●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●● ●●

●●●

●● ●

● ●

●●

● ●●

● ●

●●

●●

●●

●●

●●●

● ●●●

●●●●

●● ●

●●

●●

●●●

●●

●●

● ●● ●

● ●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●● ●

●●●● ●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

●●●

● ●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

● ●●●

●●●

●● ●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●●

●● ●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

● ●●

●●

●●●●

●●

●●

●●●●

●●

● ●●

● ●●●

● ●● ●

●●

●●● ●● ●

●●

●●●●

●●

●●

●●

● ●

● ●●

●●●

●●●

●●●

●●

●● ●●

●●

●●●

●●●●

● ●●●

●●

●●●●●●

●●

●●●

●●

● ●

●●

●●

● ●●

●●

●● ●●

●●

●●●

●●●●

●●

●●●

●●● ●

●●●●

●●

● ●●●●●

●●

●●●●

●●

● ●

●●

●●

●●●

●●●

● ●●

●●

●●●

● ●●●●

●●●

● ●●●

●●●●

●●●

●●●●●●

Magnesium

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

● ●●●

●●

●●●●●

● ●

●●●●

●●

●●●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●

● ●●●

●●

●●●●●● ●

●●●●

●●

●●● ●●●

●●

●●●

●●

●●

● ●

● ●

●●●

●●

●●● ●

●●

●●

●● ●●

●●

●●●

● ●●●

●● ●●

●●

●● ●●●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●●●

●●●

●●

●● ●●●●

●●●

●●● ●

●● ●●

●●●

● ●● ●●●

●●

● ●●

●●

●●

●●

●●

● ●●

●●

●●●●

●●

●●

● ●●●

●●

●●●

●●●●

●●●●

●●

●●● ●●●

●●

●●●●

●●

●●

●●

● ●

●●●

●●

●●●●

●●

●●●

●● ●●

●●

●●●

●●●●

●● ●●

●●

●●●●●●

●●

●●●

●●

● ●

●●

●●

●●●

●●●

● ●●

●●

●●

●●●●

●●

● ●●

●●●●

● ●● ●

●●

● ●●●● ●● 80

140

●●●

●●

●●

●●

●●

● ●●

●●

●●●●

●●●

●●

● ●●●

●●

● ●●

●●● ●

●●●●

●●●

●●●●●●

1.0

3.0

●●● ●

●●

●●

●●

● ●

●●●

●● ●

●●●

●●●

● ●●

● ●

●●

●●●

● ●● ●●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

● ●●

●●

●●

●●

●●

●● ● ● ●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●

● ●

●●

●●

●●

●●● ● ●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●●●

●●● ● ●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●

● ●

●●

●●●●

●●●Phenols

● ●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●

● ●

●●

●●

●●

●●● ● ●●●

●●

●●

●●

● ●

●● ●●

●● ●

●●●

●●●

●● ●

●●

●●

●●

●●

●●● ●●●●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●● ● ● ●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●● ● ●● ●

●●

●●

●●

● ●

●●●

●●●

● ●●

● ●●

●● ●

●●

●●

●●

●●

●●● ●●● ●

●●

●●

●●

●●

●●●●

●●●

●●●

●●●

● ●●

●●

●●

●●

●●

● ●● ● ●● ●

●●

●●

●●

●●

●●●●

●●●

●●●

● ●●

●●●

●●

●●

●●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●● ●●● ●

●● ●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●●●

●●

●●●

●●

●●

●●

● ●

●●

●●●● ●●●

●●●

●●● ●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●● ●●●●

●●

●●●

●●● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

● ●● ●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●● ●●●

● ● ●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

● ●●

●●

●●●●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

Flavanoids

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●● ●● ●

●● ●

●●

● ●●●

●●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●●●●●●

●● ●

●●●●●●

●●

●● ●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●●●●●

●●

● ●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●●●●●

●●●

●●

●● ●●●

●●

●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

● ●● ●●●●

●●●

●●

●●●●

●●●

●●●●

●●

●●

●●

●●

● ●

13

5

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●●

●●

●● ●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●●●

●●

0.2

0.6

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●● Nonflavanoid●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●● ●

●●

●●

●●

●●

●●

●●●● ●

●● ●●

●●●

● ●●● ●●● ●

●●

● ●

●●● ●●

●● ●

●●●●

●●

●●

●●

●●

●●

● ●●●●●●●

●●

●● ●●● ●●●

●●

●●

●●●●

● ●●

● ●●●

●●

●●

●●

●●

●●

●● ●●●●●●

●●

● ● ●●● ●●●

●●

● ●

● ● ●●

● ●●

● ●●●

●●

●●

●●

●●

●●

● ●●● ●

● ●●●

●●

●● ●●●●● ●

●●

●●

●● ●●

● ●●

● ●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●●

●● ●●●●●●

●●

● ●

● ● ●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●● ●●●●

●●

●●

●●

● ● ●●

●●●

● ●●●

●●

●●

●●

●●

●●

●●●●●●●●

●●

●● ●●●●●●

●●

● ●

● ● ●●

●●●

● ●●●

●●

●●

●●

●●

●●

● ●● ●●

●● ●●

●●

●● ●●● ●● ●

●●

●●

●● ●●

● ●●

Proanthocyanins

● ●●●

●●

●●

●●

●●

●●

●●●●●● ●●●

●●

●●●●●●●●

●●

●●

● ● ●●●

●●●

● ●● ●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●● ●●●●●

●●

●●

●● ●●

● ●●

●●● ●

●●

●●

●●

●●

●●

●●●● ●

● ●●●

●●●

●● ●● ●●●●

●●

●●

● ● ●●●

●● ●

0.5

2.5

● ●● ●

●●

●●

●●

●●

●●

●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●●

28

●●

●●

●●

●●● ●●●●●

●●

●●

● ●●●●●●●

●●

●●● ●

●●

●●●

●●● ●●●●

●● ●

● ● ●●

● ●●●●

●●●

●●

●●

●●●

●●●●●●●●

●●

●●

●● ●●●●

●●●

●●

● ●●

●●

● ●●● ●●●

●● ●●

●● ●

●●●●

● ●●●●

●● ●

●●

●●

●●

●● ● ●● ●●●

●●

●●

●●● ●●●●●

●●●

● ●●

●●

● ●●● ●

●●● ●●

●●●

●● ●●

●● ●●●

●●●

●●

●●

●●

●● ● ●● ●●●●

●●

●●

●● ● ●●●

●●

●●●● ●●

●●● ●●●●

● ●●●●

●●●●●●

●●●●●

●●●

●●

●●

●●●

● ●●●● ●●●

●●

●●

●●●●●●

●●

●●

●●●●

●●

● ●●●●

●●●● ●

●●●

●● ●●

●●●●●

●●●

●●

●●

●●

●●● ●● ●●●

●●

●●

●●●●●●

●●●

●●

●● ●

●●

● ●●

●●●●

● ● ●●

●●●●●●

● ●●●●

●●●

●●

●●

●●

●●● ●● ●●●

●●

●●

●●●●●●●

●●

●●●● ●

●●

● ●●

●●●●● ●●●

●●●●● ●

● ●●●●

●●●

●●

●●

●●●

●● ●● ●●●●

●●

●●

● ● ●● ●●

●●

●●

●●● ●

●●

● ●●

● ●● ●●●●

●●●

●●●●

●●● ●●

●●● ●

●●

●●

● ●● ●● ●●●

●●

●●

●●●●●●

●●●

●●

●●●

●●●●●●●●●

● ● ●●

●●●

●●●●

● ● ●●●●●●

Color

●●

●●

●●●

● ●● ●● ●●●

●●

● ●

● ●●●●●●●

●●●●● ●

●●

●●●

●●●●

●●●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●

● ●● ●● ●●●

●●

●●

●●● ●●●

●●

●●

●●● ●

●●

● ●●

●●●●

●●●●

●● ●

● ●●●

●●●●

●●

●●●

●●

●●●

●● ●●● ●●●●

●●

●●

●●●●●●

●●●

●●

●●●

●●

●●●

●●●●

● ●●●

●●●●●●●

●●●●●

●● ●

●●

●●●

● ●

●●

●●

●●●

●●●●

●● ●

●●

●●

●●●

●●

●●

●●● ●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●● ●●●

●●●

●●●

●●●●

●●

●●

●●

● ●

●●●

●●

● ●

●● ●

●●

●●

● ●

●● ●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●

●●

●● ●

●●

●●

●●

●● ●

●●● ●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●●

●● ●●

● ● ●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

● ●

●●

● ●

●●●

●●●●

● ●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

● ●

●●

● ●

●●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●●

●● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

● ●

●●●

● ●

●●

● ●

●●●

●●●●

●● ●

●●●

●●● ●

●●

●●

●●●●

● ●●

●●

●●

●●●

●●

●●

●●

●●●

●●● ●

● ●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

Hue ●

● ●

●● ●

● ●

●●

● ●

●● ●

●●● ●

●●●

●●

●●

● ●●

●●

●●

●●

●●

● ●●

●●

0.6

1.4

●●

●●●

●●

●●

●●

●●●

●●●●

●● ●

●●

●●●●

●●

●●

●●

●●

●●●

●●

1.5

3.5

●●●

●●

●●

●●●

●● ●

● ●

●●●●

●●

●●

● ●●

●●

● ●

●●

●●●

●●

●●

● ●●

●● ●

●●

●● ●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

● ●●

●● ●

●●

●●●●

●●

●●

● ●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●

● ●●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●●

●●●

●●

● ●●●

●●●

●●

●●●

●●

● ●

●●

●●●

● ●

●●

●●●

●●●

●●

● ●●●

●●●

●●

●●●●

●●

● ●

●●

●●●

●●

●●

● ●●

●●●

● ●

●● ●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●●

●● ●

●●

●● ●●●

●●

●●

●●●

●●

● ●

●●

●● ●

● ●

●●

●●●

●● ●

●●

● ●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●●

● ●

●● ●●

●●

●●

● ●●

●●

●●

●● Dilution

●●●

● ●

●●

●●●

●● ●

●●

● ●●●

●●

●●

●●●

●●

●●

●●

11 13

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●●●

●● ●

●● ●

●●

●●

●●

● ●

●●

● ●●●

● ●

●●●

● ●●● ●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●● ●

●●●

●●●

●●

●●

●●

●●

●●

● ●●●● ●

●●●

● ●●●●●

1.5 3.0

●●

● ●

● ●

●●●

●●

●●

●●●

●●

● ●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

● ●●

●● ●●●●

● ●●

●●

● ●

●●●

●●

●●

●●●

●●

●●● ●

●● ●

● ●●

●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●●●●●

80 140

●●

● ●

●●

● ●●

●●

●●

●●●

●●

● ●●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●●

●●

● ●●

●●●●●●

● ●●

●●

●●

●●●

●●

●●

● ●●

●●

● ●●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●●●●

● ●●

● ●●●●●●

1 3 5

●●

●●

●●

●●●

●●

●●

● ●●

●●● ●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

● ●●

● ●●● ●●●

● ●●

● ●

●●

●●●

● ●

●●

●●●

●●

●●● ●

●●●

●● ●

●●

●●

●●

● ●

●●

●● ●●

● ●

●●●

●●● ●●●

0.5 2.5

●●

●●

●●

● ●●

●●

●●

● ●●

●●

● ●●●

●●●

●●●

●●

●●●●

●●

●●

●● ●●●●

●●●

● ● ●● ●●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●●●●

● ●●

● ●

●●●●

●●

●●

●●●●● ●

● ●●

●●●● ●●

0.6 1.4

●●

● ●

●●

● ●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

● ●

●●

●●

●● ●●

●●

●●●●●●●●●

● ●●

●●

● ●

● ●●

●●

●●

● ●●

●●

● ●●●●

● ●● ●●

●●

●●

●●

●●

●●

● ●●●

●●

● ●●

●● ●●●●

400 1600

400

1600

Proline

Figure 3: Wine data: Scatterplot matrix with clustering induced by the three cultivars.

The command

R> res3 <- CNmixt(wine[, -1], G = 1:4, initialization = "random.soft",

+ seed = 5, parallel = TRUE)

With G = 1, some models are equivalent, so only one model from each set of

equivalent models will be run.

Page 17: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 17

Using 8 cores

Best model according to AIC has G = 3 group(s) and parsimonious structure EVV

Best model according to AICc has G = 4 group(s) and parsimonious structure

EVE

Best model according to AICu, CAIC has G = 3 group(s) and parsimonious

structure EVI

Best model according to AIC3 has G = 2 group(s) and parsimonious structure

EVV

Best model according to AWE has G = 3 group(s) and parsimonious structure EEI

Best model according to BIC, ICL has G = 3 group(s) and parsimonious

structure EEE

fits all the fourteen models for G ∈ {1, 2, 3, 4}, for a total of 56 models. In this case, a randomsoft initialization is used (initialization = "random.soft") with a pre-specified seed ofrandom generation (seed = 5). The best model, for the most commonly used criteria BICand ICL, is EEE. The classification performance of this model can be seen via the command

R> agree(res3, givgroup = group)

groups

givgroup 1 2 3 bad points

Barbera 44 0 0 4

Barolo 0 0 59 0

Grignolino 0 49 0 22

As we can note, there are not misclassified wines; however, 26 wines are recognized as bad, 22of which arise from the Grignolino cultivar. Normal and t mixtures do not allow this detection;moreover their classification, as estimated via the packages mixture and teigen, is not as goodas that provided above (cf. Punzo and McNicholas 2016). The graphical representation of theclassification from the selected model can be obtained via the command (see Figure 4)

R> pairs(res3, cex = 0.6, gap = 0, cex.labels = 0.6)

6. Conclusions

In this paper, we have introduced ContaminatedMixt, a package for the R software environ-ment, specifically conceived for fitting and disseminating parsimonious mixtures of multivari-ate contaminated normal distributions. Although these models have been originally proposed

Page 18: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

18 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

Alcohol

1 3 5

●●●

●●

●●

●●

● ●●

● ●● ●● ●

●●

●●● ● ●

●●

●● ●

●●

● ●●

●●

●●●●

● ●●

● ●

●●

●●

●●●

●●●●●●

●●

●●● ●●

●●

●●●

●●

●●●

●●

●●●●

10 20 30

● ●●

● ●

●●

● ●

●●●

● ● ●●●●

●●

●●● ●●

●●

●●●

●●

●●●

●●

●●●●

● ●●

●●

●●

●●

● ●●

●●●●● ●●

●●●●●

●●

●●●

●●

●●●

●●

●● ●●

1.0 3.0

●●●

●●

●●

●●

● ●●

●●● ●●●

●●

●●●● ●

●●●

●●●

●●

●●●

●●

●●●●

● ●●

●●

●●

●●

●●●

●●●●●●●●

●●●●

●●●

●●●

●●

●●●

●●

●●●●

0.2 0.6

● ●●

● ●

●●

● ●

●●●

●●●●●●

●●

●●● ●●

●●

●● ●

●●

● ●●

●●

●●● ●

●●●

●●

●●

●●

● ●●

●●●●●●

●●

●●●●

●●●

●● ●

●●

●●●

●●

●●●●

2 8

● ●●

●●

●●

●●

●●●

● ●●●● ●

●●

●●●●●

●●

●● ●

●●

●●●

●●

●● ●●

● ●●

●●

●●

●●

●●●

●●● ●●●●

●●● ●●

●●●

●●●

●●

●●●

●●

●●●●

1.5 3.5

●●●

●●

●●

● ●

●●●

●●●●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●●●

1113

● ●●

● ●

●●

●●

●●●

●●●●●●●

●●●●●

●●

●● ●

●●

●●●

●●

●●●●

13

5

●●●

●●● ●●

● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

Malic

● ●●

●●

● ●●●● ●

●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

● ●●●●

● ● ●●● ●

●●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

● ●●

●●

●● ●● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

● ● ●●● ●

●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

● ●●●●

● ●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●●●●

● ●●

●●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●●

●●

● ●●● ●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

● ●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●● ●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

● ● ●●● ●

●●

●●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

● ●●

●●

●● ●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●

●●

●●

●●

● ●●●●●●

●●

●●

●● ● ●

●●●

●●

●●

● ●

● ●●

●●●

● ●●

●●

● ●●●●

●●

●●

●●

●● ●● ●● ●

●●

●●

●●● ●

●● ●

●●

●●

●●

●●●

●●●

●● ●

●●

●●● ●●

●●

Ash

● ●

● ●

●● ● ●●●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●●

●● ●

● ●●

●●

●●●● ●

●●

● ●

●●

● ●●●●● ●

●●

●●

●●●●

●● ●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●

● ●

●●

●●●● ●●●

●●

●●

●●●●

●●●

●●

●●

●●

● ●●

●●●

●●●

●●

●●●●●

●●

●●

●●

●●●●●●●

●●

●●

●●●●

●● ●

●●

●●

●●

● ●●

●●●

●●●

●●

●●●●●

●●

●●

● ●

●●●●●●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●●

●● ●

●● ●

●●

● ●● ●●

●●

●●

●●

●●●●● ●●

●●

●●

●●●●

●●●

●●

●●

●●

● ●●

●●●

●●●

●●

●●●● ●

●●

●●

●●

● ● ●●●● ●

●●

●●

●●●●

●●●

●●

●●

● ●

● ●●

●●●

● ●●

●●

● ●●●●

●●

●●

●●

●●●● ●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●

● ●

● ●

●●●●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●● ●

●●●

●●

●●●●●

●●

1.5

3.0

● ●

●●

● ●●●●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●

1020

30

●● ●●

●●

●●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

● ● ●●

●●

●●●●

●●

●● ●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●●●

●●

●● ●●

●●

●●● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●●●

●●

Alcalinity

●●● ●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●● ●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●●

●●

● ●● ●

● ●

●●● ●

●●

●● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

● ●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●● ●

●●

● ●●●

●●

●●● ●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●● ●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●● ●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

●●●

● ●

●●

●●● ●

●●

●●

●●●

●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●●

●●

● ●●

●●

●●

●●●●

●●

Magnesium

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●● ●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●● ●

●●

●●●

●●

●●

●●●●●

●●

●●

●●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

● ●●

● ●

●●

●●

● ●●

● ●

●●

●●●

●● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●●

● ●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●● ●

● ●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

● ●

●●●●

●●●●

●●●

●●

●●

●●●●

●●

●●

●●●

● ●●

●●

● ●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●●

● ●●

●●

●●

●●●●

●●

8014

0

●●●

●● ●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

1.0

3.0

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

● ●●

●●

● ●●

●●

●●● ●●

●● ●

●●

●●●●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●● ●

●●●

●●● ●●

●●

● ● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●●●●

● ● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●● ●

●●

●●

● ●●

●● ●

●●●● ●●●

● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●●

●●●●●

●● ●

●●

●●●●●

●Phenols

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●●●●

●●●

●●●●●●●

● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●● ●

●●

●●

●● ●

●●●

●●

● ●●● ●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●● ●

●●

●● ●●●

● ● ● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●● ●●

●●

●●● ●●

●●●

●●●●● ●●

● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●

● ●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●●●●

●●

● ● ●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

●●●

●●●

●●

●●●●●

●●

● ●●●

●●

●●

● ●

●●

●●

●●

●●●●

●●●● ●●● ●●●●

● ●● ●

● ●●●●

● ● ●●

●●●●●● ●●

●● ●●

● ● ●●●●● ● ●●●●●●

●●

●●● ●

●●

●●

●●

● ●● ●

● ● ● ●●●●● ●●●

● ●●●

●●●●●

● ●●●

●● ●●●●● ●

●●● ●

●●● ●●●● ● ●●

● ●●●

●●

●●

●●

●●

●●

●●

●●●●

●● ●●● ●●● ●●●

●●●●

● ●● ●●

● ●● ●

●●●●●●●●●

● ● ●●● ●●●●● ● ●

●●●●

● ●

●●

●●

●●●●

●●

● ● ●●

●● ● ●● ●●● ●●●●●●●

● ●● ●●

● ●●●

●●●● ●● ●●

●● ●●●●●●

●●● ● ●●● ● ●●

● ●

●●

● ●

●●

●●

●●

●●●●

● ●●●● ●●●●● ●

●●●●

● ●●●●

●●●●

●●●●●●●●●

● ●●●●●●● ●●● ●

●● ●●

● ●

●●

● ●

●●

● ●

●●

●●● ●

●●● ●● ●●●●●●

●●●●

●●● ●●

●●●●

●●●●●●●●

●●●

●●● ●●●●● ●

Flavanoids

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●●●

●●● ●●●●● ●●●

●●●●

●●●●●

● ●●●

●●●● ●●● ●

●●●

●● ●● ●●● ●● ●

●●● ●

●●

●●

● ●

●●

● ●

●●

●●●●

●●● ●●●●●●●●

●●● ●

●●● ●●

●●●●

●●●●●●●●

●● ●●

●● ●● ●●●● ●●

●●●●

●●

●●

●●

●●

●●

●●

● ●●●

● ●● ●●●●●●● ●

●●●●

●● ●●●

●●●

●●●●●● ●●

●●●●

● ●●●● ●●● ●●

●● ●●

●●

●●

● ●

●●

● ●

●●

●●● ●

●●●●●●●● ●●●

●●●●

●●● ●●●●●

●●●●●●●●

●●●

●●●●●●●●● ●

●● ● ●

● ●

●●

●●

●●

● ●

●●

●●●●

●●● ●● ●●●●●●

●●●●

●●●●●●●●●

●●●● ●●●●

●●●

●●●●●

●●●● 13

5

●●

● ●●●

● ●

●●

● ●

●●●●

●●

●●●●

●●●●●●● ●●● ●

●●●●

●●●●●●●●

●●●●●●●●●●●●

●● ●●●●●●

0.2

0.6

● ●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

● ●

●●●

●●● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●● ●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●●

● ●

●● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●● ●

●●

●● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

Nonflavanoid●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

● ●

●●●

●●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

● ●●

●●●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●● ●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

● ●●●

●●

●●●●

●●

●●

●●

●●●

●●● ●

●●

●●●

●●

●●

●●

●●●

●●●

● ● ●

●●●●

●●●●

●●● ●●

●●●

●●

●● ●●●●

●● ●

●●

●● ●●●

● ●

●●

●●● ●

●●●

●●●

● ● ●

● ●● ●

●● ●

●●●●● ●

●●●

●●

● ●●●●

●●●

●●●●● ●

●●

●●

●●● ●

● ●●

●●●

● ●●

●●●●

●● ●

●● ●●● ●

●●●

●●

●● ●●●

●●●

●●●●●

●●

●●

●●●●● ●

● ●●●

● ●●

● ●●

● ● ●●

●● ●

●● ●●● ●

●●●

●●

●● ●●●

●●●

●●

●● ●●●

●●

●●

●●● ●

● ●●

● ●●

● ●●

●●●●

●●●●

● ●●●●

● ●●

●●

● ● ●●●

●●●

●●●●●●

●●

●●

● ●●●

●●●

● ●●

● ●●

●●● ●

●●●

●● ●●●●

●●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●●● ●

● ●●●

●●●

● ●●

●●●●

●●●●●●●●●

●●●

●●

● ●●●●

●●●

●●●●●

●●

●●

●●●●●●

● ●●●

●●●

●●●

●●●●

●●●●

●●●● ●

●●●

●●

●●●●●

●●●

●●

●●●●

● ●

●●

●● ●●

Proanthocyanins

● ●●

●●●

● ●●

● ●●●

●●●

●●●●●●

● ●●

●●

●●● ●●

●● ●

●●●● ●●

● ●

●●

● ●●●

● ●●●

●●●

●●●

●●● ●

●●●

●●●●● ●

●●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●●●●

●●●

● ●●

● ●●

●●●●

●●●●● ●●●●

●●●

●●

●●●●●

●●●

●●

●●●●

●●

●●

●●●●

0.5

2.5

● ●●

● ●●

●●●

●●●●

●●●

●●●● ●●

● ●●

●●

●●●●●

●●●

●●

●●●●●

●●

●●●●●●

28

●●

●●● ●●● ●● ●● ●●●●●

●●●●●●●

●●

●●●

●●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

● ●

●●

●● ●

●●

●●●●●●●● ● ● ●●● ●●

●●●

●●●●

●●

●●●

●● ●

●●

●●

●●

● ●●

●●

●●●

●●

● ●

●●

●●

●● ●

●●

●● ● ●●●● ●● ●●● ●●●

● ●●●●●●

●●

●●●

●●●

●●

●●●

● ●●

●●

●●●

●●

● ●

● ●

●●

●● ●

●●

●● ● ● ●●● ●● ●● ●●●●

●●●

● ●●●

●●

●●●

●●●

●●

●●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●● ●

●●

● ●●● ●● ●● ●●●●● ●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●● ● ●●● ● ●● ●● ● ●●

●●●

●●● ●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

●●

●● ●

●●

●●● ●●●●●●● ●● ●●●

●●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●● ●●●● ●● ●● ● ●●●●

●●●

●●● ●

●●

●●●

●●●

●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●● ●●● ●● ●● ●● ● ●●

● ●●

●●● ●

●●

●●●

●● ●

●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

● ●

●●●

Color

●●

● ●● ●●●●● ●● ●●●●●

●●●●●●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

● ●● ● ●●● ●●● ● ●●●●

● ●●●●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●● ●● ●●●● ●●●● ●●●

●●●

●●●●

●●●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

●●

●● ●●●

●●●●● ●

●●

●●

● ● ●● ●● ●●

●●● ● ●●

●●●

●●

● ●●

●● ●

●●● ●●

●●● ●

●●

●●

●●

●●

●●● ●●

●● ● ●●●

●●

● ●

●● ●●● ●●●

● ●● ●●●

●● ●

●●

●●●

●●●

●●●●

●●

●● ●

● ●

●●

●●

●●

● ●●●●●●● ●

●●●●

●●

●●● ●●● ●●

●●● ●●●●●●

●●

●●●

●● ●

●●● ●●

●●● ●

●●

● ●

●●

●●

●●● ●●

●●● ●●●

●●

●●

● ●●●●● ●●

●●● ●●●

●●●

●●

● ●●●

● ●●●

●●●

●●● ●

● ●

● ●

●●

●●

●● ●●●

●● ●●●●

●●

●●

●●●●● ● ●●●●●●●

●●●●

●●

●●●

●● ●

●●●●●

●●●●

● ●

● ●

●●

●●

●●●●●

● ●●●●●

●●

● ●

●●● ●●●●●

●●●●●●

●●●

●●

●●●

●●● ●●

● ●●●

●● ●

●●

●●

●●

●●

●●●●●●●●●

●●●●●

●●

●●●●● ●●●●●●●●

●●●●

●●

●●●

●●●●●

●●●●●●●

●●

●●

●●

●●

●●●●●

● ●●●●●

●●

●●

● ●● ●●●●●● ●● ●●

●●●●

●●

●●●

●●●● ●

●● ●●● ●●

●●

●●

●●

●●

● ●●●●● ●●●

●●●●

● ●

●●●● ●●●●

● ●●●●●

●●●

●●

●●●

●● ●●●

● ●●●●●●

●●

●●

●●

●●

●● ● ●●

●● ●●●●●

●●

●●

●●● ●●●●●

● ●●● ●●

●●●

●●

● ●●

●●●

●●●●

●●

●●●

Hue ●

● ●

● ●

●●

●●

● ●●●●●●●●

●●●

●●

●●

●●● ●●●●●

●●●●●●

●●●

●●

●●●

●●●●●

●●●

●●●● 0.

61.

4

● ●

● ●

●●

●●

●● ●●●●●●●

●●●●

●●

●●● ●●●●●

● ●●●●●

●●●

●●

●●●

●●●

●●● ●●

●●●●

1.5

3.5

●●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

● ●●

●●● ●

●●●

●●

●●

● ●●●● ●●●● ●●

●●● ●●●●

●●

●●

●●● ●

● ●●

●●

●●

● ●●

●●

●●●

●●● ●

●●●●

●●●

●● ●●

●● ●●●● ●

●●● ●●

●●

● ●

●●

●●●●

●●●

●●

●●

●●●

●●● ●

●●

●● ●● ●

●●

●●●

●●● ●● ● ●●● ●●

●●● ●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●●

●●● ●●

●●● ●● ●

●●

●●

● ●●●● ●●●●●● ●●● ●

●●●

●●

●●

●●●●

● ●●

●●

●●

●● ●

●●

● ●●●

●●●●●

●●●

●●

●●● ●● ●●●●●●

● ●●●●●●

● ●

●●

●●● ●

●●●

●●

●●

● ●●

●●

●●●

●●●●●●

●●

●●

●●● ●●●

●●● ●●

●●● ●●

●●

● ●

●●

●●●●

●●●

●●

●●●

●●●

●●

●●●●

●●●●●

●●

●●●

●●●●

●●●●

●●●●●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●●

●● ●● ●

●●

●●

●● ●●

●●● ●

●● ●●● ●●

●●●

● ●

●●

●●●●

●●●

●●

●●

● ●●

●●

●●●

●●●●

●●●

●●

●●

●●● ●● ●●●

● ●●●●●●

●● ●

●●

●●

●●●●● ●

●●

●●

●● ●

●●

●●●

●●●●

●●●

●●

●●

● ●●●

●●●●●●●

● ●●●●

●●

●●

●●

●●● ●●●●

●●

●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●●

●●●●

●●●●●●●

Dilution

●●●

● ●

●●

●●●●

●●●

●●

●●

●● ●

●●

●●●

●●●●

● ●●

●●

●●

●●●●●●●●

● ●●●●●●

11 13

●●

●●●

●●

●●

●●●

● ●●●

●●●

●●●●

● ●●●

●●

● ●

● ●● ●●●

●●●

●●●

●●● ●

●● ●●

● ●

●●●

●●

● ●●

●●●

●●●

●●

●●●

● ●●●

●●● ●● ● ●

●●●●

● ●

●●

●●●●●●

●●●

●●●

●●●●

●●●●

●●

●●●

●●

1.5 3.0

●●

●●●

●●

●●

●● ●

● ●● ●

●●●●●●

●● ●●

●●

●●

●●● ●● ●

●●●

●●●

●●

●●●

●● ●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●●

● ●●●

●● ●●●● ●

●● ●●

●●

● ●

●●● ●● ●

●●●

●●●

●●

● ●●

●● ●●

●●

●● ●●●

80 140

●●

● ●●

●●

●●

●●●

● ●●●

●●●●● ●●

●● ●●

●●

●●

●●● ●●●

●●●

●●●

●●●●

●● ●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●●

● ●●●

●●● ● ●●●

●● ●●

● ●

●●

●●●●● ●

●●●

●●●

●●

●●●

●●● ●●●

●●●

●●

1 3 5

●●

●●●

●●

●●

●●●

● ●●●

●●●●●●

●●●●

●●

●●

●●●●●●

●●●

●●●

●●●●

●●●●●●

●●●●●

● ●●

●●●

●●

●●

●● ●

●●●●

●●●● ●●●

●●●●

●●

● ●

●●●●●●

●●●

●●●●

●●●

●●●●● ●

●●●

● ●

0.5 2.5

●●

● ●●

●●

●●

●●●

●●● ●

●●●● ●●●

●●●●

● ●

●●

● ●●●● ●

●●●

●●●

●●●●

●● ●●

●●

●● ●

●●

● ●●

●●●

●●●

●●

●●●

● ●●●

●●●

●●●●

●●●●

●●

●●

●●●● ●●

●●●

●●●

●●● ●

●●●●

● ●

●●●

●●

0.6 1.4

●●

● ●●

●●

●●

●● ●

●●●●

●●● ●●●●

●●●●

●●

●●

●●●●● ●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

● ●●

● ●●

●●

●●

●●●

● ●● ●

●●●

●●●●

●● ●●

●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

400 1600

400

1600

Proline

Figure 4: Wine data: Classification results from the model selected by BIC and ICL. Badpoints are represented by black bullets.

for clustering applications (Punzo and McNicholas 2016), their use has been here extendedto model-based classification, where information about the group membership of some of theobservations is available. The package is also meant to be a user-friendly tool for an automaticdetection of mild outliers (also referred to as bad points herein). Advantageously, computa-tion can take advantage of parallelization on multicore PCs and computer clusters, when acomparison among different models is needed. This is handy when, as it is often the case inpractical applications, the number of clusters and/or the covariance structure of the modelis not a priori known. We believe our package may be a practical tool supporting academicsand practitioners who are involved in robust cluster/classification analysis applications.

Page 19: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 19

References

Aitken AC (1926). “On Bernoulli’s Numerical Solution of Algebraic Equations.” In Proceedingsof the Royal Society of Edinburgh, volume 46, pp. 289–305.

Aitkin M, Wilson GT (1980). “Mixture Models, Outliers, and the EM Algorithm.” Techno-metrics, 22(3), 325–331.

Akaike H (1973). “Information Theory and an Extension of Maximum Likelihood Principle.”In BN Petrov, F Csaki (eds.), Second International Symposium on Information Theory, pp.267–281. Akademiai Kiado, Budapest.

Andrews JL, McNicholas PD (2012). “Model-Based Clustering, Classification, and Discrim-inant Analysis via Mixtures of Multivariate t-Distributions.” Statistics and Computing,22(5), 1021–1029.

Andrews JL, McNicholas PD, Subedi S (2011). “Model-Based Classification via Mixtures ofMultivariate t-Distributions.” Computational Statistics and Data Analysis, 55(1), 520–529.

Andrews JL, Wickins JR, Boers NM, McNicholas PD (2015). teigen: Model-Based Clusteringand Classification with the Multivariate t Distribution. Version 2.1.0 (2015-11-20), URLhttp://CRAN.R-project.org/package=teigen.

Bagnato L, Punzo A (2013). “Finite Mixtures of Unimodal Beta and Gamma Densities andthe k-Bumps Algorithm.” Computational Statistics, 28(4), 1571–1597.

Banfield JD, Raftery AE (1993). “Model-Based Gaussian and Non-Gaussian Clustering.”Biometrics, 49(3), 803–821.

Barnett V, Lewis T (1994). Outliers in Statistical Data. Wiley Series in Probability &Statistics. Wiley. URL https://books.google.it/books?id=B44QAQAAIAAJ.

Becker C, Gather U (1999). “The Masking Breakdown Point of Multivariate Outlier Identifi-cation Rules.” Journal of the American Statistical Association, 94(447), 947–955.

Berkane M, Bentler PM (1988). “Estimation of Contamination Parameters and Identificationof Outliers in Multivariate Data.” Sociological Methods & Research, 17(1), 55–64.

Biecek P, Szczurek E, Vingron M, Tiuryn J (2012). “The R Package bgmm: Mixture Modelingwith Uncertain Knowledge.” Journal of Statistical Software, 47(3), 1–31. ISSN 1548-7660.URL http://www.jstatsoft.org/v47/i03.

Biernacki C, Celeux G, Govaert G (2000). “Assessing a Mixture Model for Clustering withthe Integrated Completed Likelihood.” Pattern Analysis and Machine Intelligence, IEEETransactions on, 22(7), 719–725.

Biernacki C, Celeux G, Govaert G (2003). “Choosing Starting Values for the EM Algorithm forGetting the Highest Likelihood in Multivariate Gaussian Mixture Models.” ComputationalStatistics & Data Analysis, 41(3–4), 561–575.

Bock HH (2002). “Clustering Methods: From Classical Models to New Approaches.” Statisticsin Transition, 5(5), 725–758.

Page 20: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

20 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

Bohning D (2000). Computer-Assisted Analysis of Mixtures and Applications: Meta-analysis,Disease Mapping and Others, volume 81 of Monographs on Statistics and Applied Probabil-ity. Chapman & Hall/CRC, London.

Bozdogan H (1987). “Model Selection and Akaikes’s Information Criterion (AIC): The GeneralTheory and Its Analytical Extensions.” Psycometrika, 52, 345–370.

Bozdogan H (1994). “Theory & Methodology of Time Series Analysis.” In Proceedings ofthe First US/Japan Conference on the Frontiers of Statistical Modeling: An InformationalApproach, volume 1. Kluwer Academic Publishers, Dordrecht.

Browne RP, ElSherbiny A, McNicholas PD (2015). mixture: Mixture Models for Clusteringand Classification. Version 1.4 (2015-03-10), URL http://CRAN.R-project.org/package=

mixture.

Browne RP, McNicholas PD (2012). “Model-based clustering, classification, and discriminantanalysis of data with mixed type.” Journal of Statistical Planning and Inference, 142(11),2976–2984.

Celeux G, Govaert G (1995). “Gaussian Parsimonious Clustering Models.” Pattern Recogni-tion, 28(5), 781–793.

Dang UJ, Browne RP, McNicholas PD (2015). “Mixtures of Multivariate Power ExponentialDistributions.” Biometrics.

Davies L, Gather U (1993). “The Identification of Multiple Outliers.” Journal of the AmericanStatistical Association, 88(423), 782–792.

Dean N, Murphy TB, Downey G (2006). “Using Unlabelled Data to Update ClassificationRules with Applications in Food Authenticity Studies.” Journal of the Royal StatisticalSociety: Series C (Applied Statistics), 55(1), 1–14.

Dempster AP, Laird NM, Rubin DB (1977). “Maximum Likelihood from Incomplete Datavia the EM Algorithm.” Journal of the Royal Statistical Society. Series B (Methodological),39(1), 1–38.

Fraley C, Raftery AE (1998). “How Many Clusters? Which Clustering Method? Answers ViaModel-Based Cluster Analysis.” Computer Journal, 41(8), 578–588.

Fraley C, Raftery AE (2007). “Model-based Methods of Classification: Using the mclustSoftware in Chemometrics.” Journal of Statistical Software, 18(6), 1–13. URL http:

//www.jstatsoft.org/v18/i06.

Fraley C, Raftery AE, Scrucca L, Murphy TB, Fop M (2015). mclust: Normal MixtureModelling for Model-Based Clustering, Classification, and Density Estimation. Version 5.1(2015-10-27), URL http://cran.r-project.org/web/packages=mclust.

Fritz H, Garcıa-Escudero LA, Mayo-Iscar A (2012). “tclust: An R Package for a TrimmingApproach to Cluster Analysis.” Journal of Statistical Software, 47(12), 1–26.

Gallegos MT, Ritter G (2009). “Trimmed ML Estimation of Contaminated Mixtures.”Sankhya: The Indian Journal of Statistics, Series A, 71(2), 164–220.

Page 21: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 21

Gomez-Villegas MA, Gomez-Sanchez-Manzano E, Maın P, Navarro H (2011). “The Effectof Non-Normality in the Power Exponential Distributions.” In L Pardo, N Balakrishnan,MA Gil (eds.), Modern Mathematical Tools and Techniques in Capturing Complexity, Un-derstanding Complex Systems, pp. 119–129. Springer, Berlin Heidelberg.

Hathaway RJ (1986). “A Constrained EM Algorithm for Univariate Normal Mixtures.” Jour-nal of Statistical Computation and Simulation, 23(3), 211–230.

Hennig C (2002). “Fixed Point Clusters for Linear Regression: Computation and Comparison.”Journal of Classification, 19(2), 249–276.

Hurvich CM, Tsai CL (1989). “Regression and Time Series Model Selection in Small Samples.”Biometrika, 76(2), 297–307.

Ingrassia S (2004). “A Likelihood-Based Constrained Algorithm for Multivariate NormalMixture Models.” Statistical Methods and Applications, 13(2), 151–166.

Karlis D, Xekalaki E (2003). “Choosing Initial Values for the EM Algorithm for Finite Mix-tures.” Computational Statistics & Data Analysis, 41(3–4), 577–590.

Lange KL, Little RJA, Taylor JMG (1989). “Robust Statistical Modeling using the t Distri-bution.” Journal of the American Statistical Association, 84(408), 881–896.

Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G (2015). Rmixmod: Aninterface of MIXMOD. Version 2.0.3 (2015-10-06), URL http://cran.r-project.org/

web/package=Rmixmod.

Leisch F, Grun B (2015). CRAN Task View: Cluster Analysis & Finite Mixture Models.Version 2015-07-24, URL https://cran.r-project.org/web/views/Cluster.html.

Lindsay BG (1995). Mixture Models: Theory, Geometry and Applications, volume 5. NSF-CBMS Regional Conference Series in Probability and Statistics, Institute of MathematicalStatistics, Hayward, California.

Little RJA (1988). “Robust Estimation of the Mean and Covariance Matrix from Data withMissing Values.” Applied Statistics, 37(1), 23–38.

Maruotti A, Punzo A (2016). “Model-Based Time-Varying Clustering of Multivariate Lon-gitudinal Data with Covariates and Outliers.” Computational Statistics & Data Analysis.doi:10.1016/j.csda.2016.05.024.

McLachlan GJ, Basford KE (1988). Mixture Models: Inference and Applications to Clustering.Marcel Dekker, New York.

McLachlan GJ, Krishnan T (2007). The EM Algorithm and Extensions. John Wiley & Sons,New York.

McLachlan GJ, Peel D (1998). “Robust Cluster Analysis via Mixtures of Multivariate t-Distributions.” In A Amin, D Dori, P Pudil, H Freeman (eds.), Advances in PatternRecognition, volume 1451 of Lecture Notes in Computer Science, pp. 658–666. Springer,Berlin - Heidelberg.

Page 22: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

22 ContaminatedMixt: Parsimonious Mixtures of Contaminated Normal Distributions

McNicholas PD (2010). “Model-Based Classification using Latent Gaussian Mixture Models.”Journal of Statistical Planning and Inference, 140(5), 1175–1181.

McNicholas PD, ElSherbiny A, Jampani R, McDaid A, Murphy B, Banks L (2015). pgmm:Parsimonious Gaussian Mixture Models. Version 1.2 (2015-04-28), URL http://cran.

r-project.org/web/package=pgmm.

McQuarrie A, Shumway R, Tsai CL (1997). “The Model Selection Criterion AICu.” Statistics& Probability Letters, 34(3), 285–292.

Meng XL, Rubin DB (1993). “Maximum Likelihood Estimation via the ECM Algorithm: AGeneral Framework.” Biometrika, 80(2), 267–278.

Peel D, McLachlan GJ (2000). “Robust Mixture Modelling using the t Distribution.” Statisticsand Computing, 10(4), 339–348.

Punzo A (2014). “Flexible Mixture Modeling with the Polynomial Gaussian Cluster-WeightedModel.” Statistical Modelling, 14(3), 257–291.

Punzo A, Browne RP, McNicholas PD (2016). “Hypothesis Testing for Mix-ture Model Selection.” Journal of Statistical Computation and Simulation.doi:10.1080/00949655.2015.1131282.

Punzo A, Maruotti A (2016). “Clustering Multivariate Longitudinal Observations: The Con-taminated Gaussian Hidden Markov Model.” Journal of Computational and GraphicalStatistics, pp. 1–32. doi:10.1080/10618600.2015.1089776.

Punzo A, McNicholas PD (2014a). “Robust Clustering in Regression Analysis via the Con-taminated Gaussian Cluster-Weighted Model.” arXiv.org e-print 1409.6019, available at:http://arxiv.org/abs/1409.6019.

Punzo A, McNicholas PD (2014b). “Robust High-Dimensional Modeling with the Con-taminated Gaussian Distribution.” arXiv.org e-print 1408.2128, available at: http:

//arxiv.org/abs/1408.2128.

Punzo A, McNicholas PD (2016). “Parsimonious Mixtures of Multivariate ContaminatedNormal Distributions.” Biometrical Journal.

R Core Team (2015). R: A Language and Environment for Statistical Computing. R Founda-tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Ritter G (2015). Robust Cluster Analysis and Variable Selection, volume 137 of Chapman &Hall/CRC Monographs on Statistics & Applied Probability. CRC Press.

Schwarz G (1978). “Estimating the Dimension of a Model.” The Annals of Statistics, 6(2),461–464.

Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013). “Clustering and Classification viaCluster-Weighted Factor Analyzers.” Advances in Data Analysis and Classification, 7(1),5–40.

Page 23: ContaminatedMixt: An R Package for Fitting Parsimonious ... · ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions Antonio

Antonio Punzo, Angelo Mazza, Paul D. McNicholas 23

Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015). “Cluster-Weighted t-Factor Analyzersfor Robust Model-Based Clustering and Dimension Reduction.” Statistical Methods andApplications, 24(4), 623–649.

Tukey JW (1960). “A Survey of Sampling from Contaminated Distributions.” In I Olkin (ed.),Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, StanfordStudies in Mathematics and Statistics, chapter 39, pp. 448–485. Stanford University Press,California.

Zhang J, Liang F (2010). “Robust Clustering using Exponential Power Mixtures.” Biometrics,66(4), 1078–1086.

Affiliation:

Antonio PunzoDepartment of Economics and BusinessUniversity of CataniaCorso Italia, 55, 95129 Catania, ItalyTelephone: +39/095/7537640E-mail: [email protected]: http://www.economia.unict.it/punzo

Angelo MazzaDepartment of Economics and BusinessUniversity of CataniaCorso Italia, 55, 95129 Catania, ItalyTelephone: +39/095/7537736E-mail: [email protected]: http://docenti.unict.it/a.mazza

Paul D. McNicholasDepartment of Mathematics & StatisticsMcMaster UniversityHamilton, Ontario, Canada, L8S 4L8Telephone: +1-905-525-9140, ext. 23419E-mail: [email protected]: http://ms.mcmaster.ca/~paul/