13
Computerized Medical Imaging and Graphics 42 (2015) 25–37 Contents lists available at ScienceDirect Computerized Medical Imaging and Graphics j ourna l h om epa ge : www.elsevier.com/locate/compmedimag Frequential versus spatial colour textons for breast TMA classification M. Milagro Fernández-Carrobles a , Gloria Bueno a,, Oscar Déniz a , Jesús Salido a , Marcial García-Rojo b , Lucía Gonzández-López c a VISILAB, E.T.S. Ingenieros Industriales, Universidad de Castilla-La Mancha, Ciudad Real, Spain b Department of Pathology, Hospital de Jerez de la Frontera, Spain c Department of Pathology, Hospital General Universitario de Ciudad Real, Spain a r t i c l e i n f o Article history: Received 12 April 2014 Received in revised form 30 June 2014 Accepted 10 November 2014 Keywords: TMA (tissue microarray) Texton maps Colour models Digital pathology Feature selection Automatic classification Image texture analysis a b s t r a c t Advances in digital pathology are generating huge volumes of whole slide (WSI) and tissue microarray images (TMA) which are providing new insights into the causes of cancer. The challenge is to extract and process effectively all the information in order to characterize all the heterogeneous tissue-derived data. This study aims to identify an optimal set of features that best separates different classes in breast TMA. These classes are: stroma, adipose tissue, benign and benign anomalous structures and ductal and lobular carcinomas. To this end, we propose an exhaustive assessment on the utility of textons and colour for automatic classification of breast TMA. Frequential and spatial texton maps from eight different colour models were extracted and compared. Then, in a novel way, the TMA is characterized by the 1st and 2nd order Haralick statistical descriptors obtained from the texton maps with a total of 241 × 8 features for each original RGB image. Subsequently, a feature selection process is performed to remove redun- dant information and therefore to reduce the dimensionality of the feature vector. Three methods were evaluated: linear discriminant analysis, correlation and sequential forward search. Finally, an extended bank of classifiers composed of six techniques was compared, but only three of them could significantly improve accuracy rates: Fisher, Bagging Trees and AdaBoost. Our results reveal that the combination of different colour models applied to spatial texton maps provides the most efficient representation of the breast TMA. Specifically, we found that the best colour model combination is Hb, Luv and SCT for all classifiers and the classifier that performs best for all colour model combinations is the AdaBoost. On a database comprising 628 TMA images, classification yields an accuracy of 98.1% and a precision of 96.2% with a total of 316 features on spatial textons maps. © 2014 Elsevier Ltd. All rights reserved. 1. Introduction Automated image analysis of WSI and TMA can extract specific diagnostic features of diseases and quantify individual component of these features to support diagnosis and provide informative clin- ical measures of disease such as tumour grade in cancers. Although automatic image analysis has many potential advantages includ- ing reducing inter-observer discrepancy, increasing consistency and improving efficiency, image analysis algorithms are still under research and there are many challenging problems. One of these challenges is to extract and process effectively all the information in order to characterize and model all the heterogeneous TMA data. Corresponding author. Tel.: +34 926 295 300. E-mail addresses: [email protected] (M.M. Fernández-Carrobles), [email protected] (G. Bueno). URL: http://visilab.etsii.uclm.es (G. Bueno). This study aims to identify an optimal set of features that best separate different classes within breast TMA and therefore clas- sify the TMA data by automated methods in order to achieve an objective diagnosis in digital pathology. To this end, colour and conspicuous textural properties are analyzed. There are many texture definitions in the computer vision liter- ature. In a general way, texture may be defined as the variation of data at scales smaller than the scales of interest [1]. Texture is an important cue in object recognition as it tells us something about the structure from which the objects are made. There are differ- ent theoretical representations of texture models. Each of them extracts information in a different way and different descriptors are used. Approaches to texture analysis are broadly divided into four categories: statistical, geometrical, model-based and signal processing (space-frequential) [2]. Due to the extensive research on texture analysis over the past years it is impossible to list all pub- lished methods, for surveys see [3,4,2,1]. Table 1 shows a synopsis of the most common texture models and descriptors. http://dx.doi.org/10.1016/j.compmedimag.2014.11.009 0895-6111/© 2014 Elsevier Ltd. All rights reserved.

Frequential versus spatial colour textons for breast TMA classification

  • Upload
    uclm

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

F

MMa

b

c

a

ARRA

KTTCDFAI

1

doiaiarci

G

h0

Computerized Medical Imaging and Graphics 42 (2015) 25–37

Contents lists available at ScienceDirect

Computerized Medical Imaging and Graphics

j ourna l h om epa ge : www.elsev ier .com/ locate /compmedimag

requential versus spatial colour textons for breast TMA classification

. Milagro Fernández-Carroblesa, Gloria Buenoa,∗, Oscar Déniza, Jesús Salidoa,arcial García-Rojob, Lucía Gonzández-Lópezc

VISILAB, E.T.S. Ingenieros Industriales, Universidad de Castilla-La Mancha, Ciudad Real, SpainDepartment of Pathology, Hospital de Jerez de la Frontera, SpainDepartment of Pathology, Hospital General Universitario de Ciudad Real, Spain

r t i c l e i n f o

rticle history:eceived 12 April 2014eceived in revised form 30 June 2014ccepted 10 November 2014

eywords:MA (tissue microarray)exton mapsolour modelsigital pathologyeature selectionutomatic classification

mage texture analysis

a b s t r a c t

Advances in digital pathology are generating huge volumes of whole slide (WSI) and tissue microarrayimages (TMA) which are providing new insights into the causes of cancer. The challenge is to extract andprocess effectively all the information in order to characterize all the heterogeneous tissue-derived data.This study aims to identify an optimal set of features that best separates different classes in breast TMA.These classes are: stroma, adipose tissue, benign and benign anomalous structures and ductal and lobularcarcinomas. To this end, we propose an exhaustive assessment on the utility of textons and colour forautomatic classification of breast TMA. Frequential and spatial texton maps from eight different colourmodels were extracted and compared. Then, in a novel way, the TMA is characterized by the 1st and2nd order Haralick statistical descriptors obtained from the texton maps with a total of 241 × 8 featuresfor each original RGB image. Subsequently, a feature selection process is performed to remove redun-dant information and therefore to reduce the dimensionality of the feature vector. Three methods wereevaluated: linear discriminant analysis, correlation and sequential forward search. Finally, an extendedbank of classifiers composed of six techniques was compared, but only three of them could significantlyimprove accuracy rates: Fisher, Bagging Trees and AdaBoost. Our results reveal that the combination of

different colour models applied to spatial texton maps provides the most efficient representation of thebreast TMA. Specifically, we found that the best colour model combination is Hb, Luv and SCT for allclassifiers and the classifier that performs best for all colour model combinations is the AdaBoost. On adatabase comprising 628 TMA images, classification yields an accuracy of 98.1% and a precision of 96.2%with a total of 316 features on spatial textons maps.

© 2014 Elsevier Ltd. All rights reserved.

. Introduction

Automated image analysis of WSI and TMA can extract specificiagnostic features of diseases and quantify individual componentf these features to support diagnosis and provide informative clin-cal measures of disease such as tumour grade in cancers. Althoughutomatic image analysis has many potential advantages includ-ng reducing inter-observer discrepancy, increasing consistencynd improving efficiency, image analysis algorithms are still under

esearch and there are many challenging problems. One of thesehallenges is to extract and process effectively all the informationn order to characterize and model all the heterogeneous TMA data.

∗ Corresponding author. Tel.: +34 926 295 300.E-mail addresses: [email protected] (M.M. Fernández-Carrobles),

[email protected] (G. Bueno).URL: http://visilab.etsii.uclm.es (G. Bueno).

ttp://dx.doi.org/10.1016/j.compmedimag.2014.11.009895-6111/© 2014 Elsevier Ltd. All rights reserved.

This study aims to identify an optimal set of features that bestseparate different classes within breast TMA and therefore clas-sify the TMA data by automated methods in order to achieve anobjective diagnosis in digital pathology. To this end, colour andconspicuous textural properties are analyzed.

There are many texture definitions in the computer vision liter-ature. In a general way, texture may be defined as the variation ofdata at scales smaller than the scales of interest [1]. Texture is animportant cue in object recognition as it tells us something aboutthe structure from which the objects are made. There are differ-ent theoretical representations of texture models. Each of themextracts information in a different way and different descriptorsare used. Approaches to texture analysis are broadly divided intofour categories: statistical, geometrical, model-based and signal

processing (space-frequential) [2]. Due to the extensive research ontexture analysis over the past years it is impossible to list all pub-lished methods, for surveys see [3,4,2,1]. Table 1 shows a synopsisof the most common texture models and descriptors.

26 M.M. Fernández-Carrobles et al. / Computerized Me

Table 1Types of texture models and descriptors.

Model Type of descriptor

Statistical Co-occurrence matrix (GLCM) – 1st, 2nd orderHigher order – MomentsAutocorrelation features

Geometrical Voronoi diagramsStructural

Model-based Markov random fieldsFractals

Space-frequential

FourierWavelets, Gabor

TransformedSpace

TextonsLBP (Local Binary Patterns)

htnwAdtrtit

sf[lttfetdc

iLrts6rnahdwsn

rcs

eictit

SIFT (Scale-invariant feature transform)HOG (Histogram of oriented gradients)

While there are a large number of texture descriptors, textonsave been selected in our study due to their capability to representexture overcoming problems caused by different levels of illumi-ation, distortions or rotations [5]. This capability is very importanthen working with WSI obtained from a microscope or scanner.lso, textons are composed of those features used by the textureiscrimination system to distinguish between similar patterns. Inhis way, different textural components in the same texture areepresented by textons. Then, these textural components or tex-ons can be used to classify benignant or malignant breast tissuen TMA. This article considers the use of textons and incorporateshem into an automatic TMA classification system.

There are several studies about using textons as texture repre-entation. Textons have been used in numerous fields includingabrics inspection, remote sensing and medical image analysis6–8]. Leung and Malik [6] used textures so different as felt, plush,eather, artificial grass, cork or sandpaper. Their study focused onhe concept of 3D textons, which introduces the idea of pixels withhe same label (same texton) that will look different under dif-erent lighting and viewing conditions. Results are based on twoxperiments using in both cases the Leung-Malik (LM) filter banko extract the texton vocabulary. Recognition achieved up to 97%etection rate, using 5 textons per class and histogram distanceomparison with �2.

In 2002 and 2005 Varma et al. [9,10] used the same textures asn the Leung and Malik study. Four filter banks were compared:M, Schmid, Maximum Response MR8 and MR4. The study car-ied out different experiments based on variation of the filters, theexton vocabulary size and the number of training images. Clas-ification was performed by histogram comparison with �2 over1 textures with 40 textons per class. The best classification accu-acy was 98.61%. However, in 2003 Varma et al. [11], suggested aew type of texton without using filter banks. In this case, textonsre based on the pixel intensities of an NxN square neighbour-ood around each pixel, that is, textons represented on the spatialomain. They proved in 2009 [7] that textons on the spatial domainere superior to the use of MR8 filter bank for material texture clas-

ification. The classification accuracy were up to 1.3% higher and theeighbourhood used was as small as 3 × 3 pixels square.

Other type of textons based on LBP methodology have beenecently described for fabrics inspection. These textons reduceomputational time, though they are only suitable when the imageize is not big and the quality of image is not poor [8].

Textons have also been used in biomedical images for differ-nt classification problems, including: breast parenchyma densityn mammograms [12–14], hematologic malignancies [15], prostate

ancer in histopathological images [16], cancer lesions for gastroen-erology imaging scenarios [17], lesion patterns in dermoscopicmages [18] and breast TMA [19–22]. The success varies withhe domain of the problem, the choice of texton representation,

dical Imaging and Graphics 42 (2015) 25–37

the number of textons and the classification method. Table 2summaries different state-of-the-art textons methods in biomedi-cal applications and their classification accuracy.

On the other hand, biomedical specimens do not only exhibittextural information but also colour. Researches on the humanvisual system suggest that the image signal is composed of aluminance and a chrominance component. In the human eye,chrominance is processed at a lower spatial frequency thanluminance. Furthermore, it has been shown that much of thediscriminative texture information is contained in high spatial fre-quencies. It seems that texture information is associated with theluminance component whereas chrominance is associated withhomogeneous regions [23]. Therefore, colour is also an importantfeature and its interrelationship with texture should be consideredwhen possible.

The research literature using a combination of colour andtexture for features description is limited and has not been signifi-cantly explored in the medical field. Harms et al. [24] use CIE-XYZcolour model and statistical measurements on the grey-level co-occurrence matrix (GLCM) for blood cells classification. Tabesh et al.[25] use HSV, Lab and RGB colour models with wavelets for tumourclassification in prostate biopsies. DiFranco et al. [26] use Lab withGLCM and Gaussian filtering to segment structures in prostate biop-sies. Luv colour model has also been used with SIFT features forbreast biopsies diagnosis [27].

The research works show that no standard colour model is used.Moreover, there is also a lack of comprehensive studies of the mostsuitable colour models for the different tissues types in histopath-ology [28,29].

This study presents an exhaustive assessment on the utilityof textons and colour for automatic classification of breast TMAinto four classes or structures. To this end, frequential and spatialtexton maps from eight different colour models were extracted.Then, in a novel way, the TMA is characterized by the 1st and2nd order Haralick statistical descriptors obtained from the tex-ton maps and analyzed with different classifiers. Results have beenshown with breast TMA at 10× but the method is invariant toscale and it can be applied to other pathologies different thanbreast.

The remainder of the paper is organized as follows: Section 2presents the material used, including the colour images obtainedwith eight colour models. Section 3 describes the generation of tex-tons maps. In Section 4, the features extraction based on statisticalmeasurements on the GLCM and the dimensionality reduction isdescribed. Section 5 describes the classification methods and Sec-tion 6 the experimental results. Four different experiments werecarried out for textural features obtained from textons representedboth on the frequency and spatial domains. The results reveal thatthe combination of different colour models applied to spatial tex-ton maps provides the most efficient representation of the breastTMA. Section 7 concludes the paper.

2. Materials

A data set comprising 40 breast whole-slide TMA images stainedwith hematoxylin and eosin (H&E) was acquired by a motorizedmicroscope (ALIAS II) and a scanner (Aperio ScanScope T2) at10times. Then the breast TMA cores were segmented [30] and 628representative regions of the four tissue classes were manuallyselected and reviewed by 3 expert pathologists from each TMA core.The number of samples for each class was: (i) class 1: 170 images,

(ii) class 2: 103 images, (iii) class 3: 163 images and (iv) class 4: 192images.

The size of these regions of interest (ROI) was 200 × 200 pixels,since the resolution of the devices is 0.74 �m/pixel and

M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37 27

Table 2Classification performance of different state-of-the-art textons methods in biomedical applications.

Application Texton type Classifier Num. textons Num. classes Accuracy

Mammograms [12] Spatial - 3 × 3 �2 40 4 75.00%Mammograms [13] Spatial - 3 × 3 �2 40 3 84.05%Mammograms [14] MR8 filters �2 20 3 84.50%Hematology [15] MR8 filters SVM 5 5 89.00%Prostate biopsy [16] MR8 filters SVM 8 2 93.70%Endoscopy [17] Gabor filters SVM 40 3 82.30%Dermoscopy [18] Spatial - 3 × 3 �2 3 5 86.80%Breast TMA [19] Gaussian filters NN 40 4 75.00%

0s

mbdoatb

2

btuc

vCiFvm

lf

Breast TMA [20] Spatial - 3 × 3 Adaboost

Breast TMA [21] Spatial - 3 × 3 Adaboost

Breast biopsy [22] MR8 filters SVM

.94 �m/pixel respectively for the motorized microscope and thecanner. Therefore, the total area of the ROIs is about 5.4 and 6 mm2.

The TMA tissue classes were: (i) stromal tissue with low andedium cellularity, (ii) adipose tissue, (iii) benign structures and

enign anomalous and (iv) different kinds of malignity, that is,uctal and lobular in-situ and invasive carcinomas. The typesf anomalous benignity represented in class 3 are: sclerosingnd adenosis lesions, fibroadenomas, tubular adenomas, phyllodesumors, columnar cell lesions and ductectasia. Fig. 1 shows differentreast TMA samples for each class.

.1. Colour model analysis

Colour is not interpreted in the same way by human eyes thany a computer. In order to better represent the H&E images andhe colour patterns that the human visual system perceives wesed six colour models: RGB, CMYK, HSV, Lab, Luv, SCT and twoombinations of them, these are Lb and Hb.

The eight colour representations have been compared indi-idually and jointly. In this way we address two issues: (i)haracterization of the H&E images with limited colour spectrum,

.e., only with blue and pink hues, as shown in the sample images inig. 1; (ii) Representation of the three components that the humanisual system perceives, i.e., luminance, chrominance and an achro-atic pattern component.Hence the importance of analysing colour models in histopatho-

ogical analysis. The eight additional colour spaces are described asollows:

RGB is the colour model commonly used in image processing. TheRGB model is based on a three-dimensional Cartesian coordinatesystem and each of their channels represents a primary spectralcomponent (Red, Green, Blue).The CMYK colour model is a substractive colour models whichrefers to the predominant colours in printing: cyan, magenta, yel-low and black. This colour model has been used in studies abouttissues biomarkers by immunohistochemistry (IHC) for cervicalcancer [31] and intraepithelial lesions [32].The HSV colour model comes from a nonlinear transformation ofthe RGB colour space. Their channels consist of a description of thecolour in terms of their shade (Hue) or the percentage of whitelight added to a pure colour space (Saturation) and brightness,which refers to intensity of perceived light (Value). Thus, HSVspace can separate the chromatic and achromatic components.The Lab colour model (or CIE L∗a∗b∗ colour model) was designedto approximate human vision. In the Lab, L corresponds to illu-mination (colour luminosity), and a (colour position between

magenta and green) and b (colour position between yellow andblue) channels correspond to colour opponents. Thus, featuresextracted from the Lab space characterize the intensity and colourinformation of images separately. This colour model has been

40 2 89.00%40 4 88.00%11 2 87.00%

used in histological images due their capability to segment thedifferent tissue structures, including: stroma, cells and lumen[25,33,26].

• Luv (or CIE L∗u∗v∗ colour model) is a modification of U∗V∗W∗and Lab colour models. The Luv model is especially useful whenworking with a single illuminant and uniform chrominance. TheL channel represents colour luminosity and u and v correspond tochromaticity components. These chromaticity coordinates serveto define the neutral colour in an image. Therefore, Luv colourmodel has been used for illumination normalization [34,27].

• The SCT colour model (Spherical coordinate transform) is not acolour model per se. Nevertheless, it is possible to make a conver-sion to SCT based on the RGB colour model. Thus, the SCT colourmodel is decomposed in three components M, � and �. M repro-duces the colour intensity and is the length of the RGB vector, �is the angle between the blue axis to the RG plane and � corre-sponds to the angle between G and R axes [35]. Conversion fromRGB to SCT is shown in Eqs. (1)–(3). This colour model is usefulwhen there are little changes in illumination.

M =√

R2 + G2 + B2 (1)

� = cos−1(

B

M

)(2)

� = tan−1(

G

R

)(3)

• The Lb and Hb colour spaces consist of the L, H and b colour chan-nels. The emphasis on the L and b channel is due to the ability ofthese channels to make a good segmentation on breast TMA cells.This is essential to improve classification results. Images repre-senting Lb and Hb are visualized by duplicating the b channel,that is the three colour components are Lbb and Hbb.

TMA image samples with the eight colour models are shown inFig. 2. Once the data set is built we adopt a multistep process thatinvolves four steps: feature selection from texton histograms andmaps, feature extraction by 1st and 2nd statistics from GLCM, fea-ture selection by dimensionality reduction and classification. Thesesteps are discussed in the following sections.

3. Feature description: textons

The tissue contains different types of texture that can be dis-criminant for detecting cancer. Then, these textural components ortextons can be used to distinguish benignant or malignant breasttissue.

Textures are represented by maps which have been generatedfrom the textons obtained. Textons allow to perform a posteri-

ori classification or segmentation on the textures analysed. In thisstudy two types of textons were used on the frequency and spatialdomains. The main difference between the frequential and spatialtextons is in their responses. In frequential textons these responses

28 M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37

F with Hs

aap

3

as

fitbo

(iRafi

ig. 1. Illustration of breast TMA samples for each class. TMA images were stained

tructures and anomalous and Class 4: different kinds of malignity.

re extracted by a filter bank whilst in spatial textons the responsesre computed from an N × N square neighbourhood around eachixel of the original image.

.1. Textons using filter banks: frequential texton

Firstly, the images are filtered by a filter bank. A filter bank is filter collection that detects different types of structures in theame digital sample such as level differences, lines or edges.

We compared the performance of classifiers based on differentlter banks, namely LM, Schmid, and Maximun Response. Althoughhere were not significant differences among some of the filteranks, we found that the MR8 filter bank is the most suitable forur breast TMA representation.

Therefore, the filter bank selected was the Maximum Response 8MR8). This filter bank attempts to resolve the problem of rotation

nvariant filters using edge and bar filters (anisotropic filtering).otation invariant filters are not sensitive to directionality changesnd these changes could occur in textures with stripes. The MR8lter bank is shown in Fig. 3 and is composed of:

&E. Class 1: stromal tissue with cellularity, Class 2: adipose tissue, Class 3: benign

• A Gaussian filter (Eq. (4)) and a Laplacian of Gaussian filter(isotropic filtering) (Eq. (5)) both with standard deviation of� = 10 pixels

G(x, y) = 12��2

e− x2+y2

2�2 (4)

LoG(x, y) = 1��4

[1 − x2 + y2

2�2

]e

−x2+y2

2�2 (5)

• 18 edge filters and 18 bar filters extracted from the first andsecond Gaussian derivative (Eqs. (6) and (7)) with three basicscales (�x, �y) = (1, 3), (2, 6), (4, 12) and six orientations � =0, �

6 , �3 , �

2 , 2�3 , 5�

6

G′(x, y) = −x2 + y2e

− x2+y2

2�2 (6)

2��4

G′′(x, y) = − 12��4

e− x2+y2

2�2

[1 − x2 + y2

�2

](7)

M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37 29

Fig. 2. TMA samples in different colour spaces: (a) RGB, (b) CMYK, (c) HSV, (d) Lab, (e) Luv, (f) SCT, (g) Lbb and (h) Hbb.

Fig. 3. MR8 filter bank. The filter bank comprises 38

Fig. 4. Extracting the 38 dimensional vector of pixel 1 from a RGB image.

filters for isotropic and anisotropic filtering.

The pixel vector consists of the 38 first pixel of each filtered image.

30 M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37

Fig. 5. Texton vocabulary generation. Pixel vectors are used by the k-means cluster-i6o

ft

1

2

uaebcfiiao

Fig. 7. Texton histogram generation. The k-nearest neighbours algorithm is used

ng algorithm to select the textons which form the texton vocabulary. In this study0 textons per class have been used, that is the vocabulary comprises 240 textonsf dimension 1 × 38.

In order to make an adequate study of the texture images, a use-ul set of features must be obtained. The algorithm used to extracthe frequential textons proceeds as follows:

. The filter bank is applied over the tissue images so that 38response filters are extracted. Each pixel of the original imageis now represented by a 38 dimensional vector, see Fig. 4.

. A k-means clustering algorithm is applied over all the pixel vec-tors, see Fig. 5. The algorithm creates as many cluster as weselect. Each new cluster is characterized by a representative vec-tor called texton. Thus, for the four classes we selected 60 textonsfor each class. Therefore the texton vocabulary was composed of240 textons.

Textons can be visualized as local filters. This visualization helpss to know what information is carried out by the filter responsend therefore by the textons. For the reconstruction as local filter,ach texton is multiplied by the MR8 filter bank. Firstly, each filterelonging to the filter bank is concatenated as a row in the matrixalled FB (in this study the size of FB is (38 × (49 × 49)), since the

lter size is (49 × 49)). Now, each texton (a 38 dimensional vector)

s multiplied by the pseudo-inverse of FB [6,36]. The result of visu-lizing the textons as local filters is shown in Fig. 6, where only 36ut of the 240 possible reconstructed textons have been displayed.

Fig. 6. Visualization of the first 36 reconstructed textons (

to compare each 38 dimensional vector with the texton vocabulary to choose themost nearest texton. The histogram is the number of times each texton representsa vector pixel.

A texton histogram is generated by each tissue image and thetexton vocabulary. These texton histograms will be later used asmodels for classification. Histograms are generated using the k-nearest neighbours algorithm (kNN) with k = 1. The kNN algorithmuses all textons in the vocabulary as training samples and classifyeach image pixel vector. The texton selected is the closest vocab-ulary’s texton to the given vector. Thus each image pixel is stillrepresented by a 38 dimensional vector but now these vectors aretextons belonging to the texton vocabulary. The histogram is calcu-lated then as the number of times each texton appears as a vectorpixel in the image, see Fig. 7.

Finally, texton histograms are used to form the texton maps.Once the texton histogram of an image has been calculated theimage is represented by their textons. The texton map is a represen-tation of the original image in which each pixel has been assignedthe corresponding index or texton number (a new grey level from1 to 240 in this study), see Fig. 8.

3.2. Textons using a N × N square neighbourhood: spatial texton

In addition, this study also uses the spatial textons. This kindof texton is based on Varma’s study of 2003. The most importantidea is that textons are not filtered by a filter bank. In this case,

out of the 240) as local filters from the RGB images.

M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37 31

Fig. 8. Texton maps. Each original image pixels is now represented by their corresponding texton number.

l i com

esbtir

aei

4

ssucu

Fig. 9. Representative vector of pixe

ach pixel, I(x, y) = i, is represented by the intensity values of a N × Nquare neighbourhood. Thus, the original image is now representedy a N2 dimensional vector, in this study N = 3 has been used. Oncehe feature vector of pixel i is composed, we can create nine newmages where the pixel in the i position (of each new image) is nowepresented by the corresponding value of the vector i, see Fig. 9.

As in frequential textons, a k-means clustering algorithm ispplied over all the pixel vectors (60 textons per class werextracted). Histogram generation and texton map are computedn the same way as the textons using filter bank.

. Features extraction

The power of textons is well known from different studies aboutegmentation [37], pattern search or classification [6,9,10]. In this

tudy we proposes an improvement in breast tissue classificationsing textons on both frequency and spatial domain, as well asolour models. Features are extracted in two different ways. First,sing the texton histograms, as it is commonly done. And second,

Fig. 10. Stages for featu

posed of N × N coordinates (N = 3).

calculating the 1st and 2nd order texture statistics (or Haralickcoefficients) on the texton maps. In our case, images are repre-sented as a grayscale image since there are 240 textons. A grayscaleimage is represented by 256 pixel values and each texton repre-sents a pixel value. The image would become a colour image incase of having more than 256 textons. The statistical descriptorsare extracted from the grayscale images. Thus, the informationprovided by each texton is not lost. This process is shown in Fig. 10.

Previously, we also mentioned the influence of colour on theTMA images. For that reason, each feature set has been extractedin eight different colour spaces, that is, RGB, CMYK, HSV, Lab, Luv,SCT, Hb and Lb. Finally, 32 feature sets were obtained (8 colourmodels × 2 type of textons, in the spatial and frequency domain × 2representations, histograms and maps).

4.1. Statistical descriptors

Haralick introduced a general procedure for extracting 2ndorder texture features on an image [3]. 1st and 2nd order statistical

re set extraction.

3 ed Me

dsfst

ottcpbpaianaocT

4

tAsc

4

tbspaLwait

4

msEx

lvc

tcomdo

2 M.M. Fernández-Carrobles et al. / Computeriz

escriptors are a quantification of the spatial variation in thepatial image shade. The 1st order statistical descriptors are 13eatures based on the image histogram. The histograms can extracttatistical values of the gray level image distribution like the mean,he variance or the standard deviation.

The 2nd order statistical descriptors consider the relationshipf the image pixels. They are 19 features based on the GLCM ofhe image. GLCM are 2nd order histograms that represent the spa-ial dependence of the image pixels. These spatial relationships arealculated with the neighbouring pixels in a mobile window. Theixels involved in this calculation are called reference and neigh-our pixel. Then, each pixel is becoming successively the referenceixel. The process begins at the top left of the window and endst the bottom right. Further, these relationships can be defined byndicating the distance and the angle between the reference pixelnd its neighbour. Distances were taken at 1, 3 and 5 pixel-wideeighbourhoods and at a direction parameter equal to 0◦, 45◦, 90◦

nd 135◦ to cover different directions. Thus, 228 features of 2ndrder, that is, 19 features ×3 distances ×4 orientations have beenalculated. Therefore, a total of 241 statistical features are obtained.he features are listed in Tables 3 and 4 in the Appendix.

.2. Dimensionality reduction

The combination of colour models can increase up to eight timeshe size of the feature set. Thus, we are dealing with 1928 features.

dimensionality reduction of the feature vector is needed. In thistudy, three methods were analysed: linear discriminant analysis,orrelation and sequential forward search.

.2.1. Linear discriminant analysisLinear discriminant analysis (LDA) [38] applied a reduction of

he feature set before classification. For this purpose, a linear com-ination of the features is carried out. LDA transforms the originalpace into an orthogonal and linear space where feature vectors arerioritized in order of importance and some are rejected. An over-ll 70% dimensionality reduction is achieved in our feature dataset.DA is used for homogeneous covariance matrices. Nevertheless,hen covariance matrices are heterogeneous it is necessary to use

Quadratic Discriminant Analysis (QDA), so the covariance matrixs estimated for each class. LDA is used in this study assuming thathe covariance matrix can be similar in each class.

.2.2. CorrelationCorrelation refers to the linear dependence between two or

ore variables, in our case between two or more features. This mea-ure of similarity is estimated by the correlation coefficient [39], seeq. (8), where cov is the covariance and var the variance of features

and y.

(x, y) = cov(x, y)√var(x)var(y)

(8)

Thus, 1 − |�(x, y)| is used as a measure of similarity. If an exactinear dependency exists between x and y, 1 − |�(x, y)| is 1, that is,ariables are correlated. Otherwise, if 1 − |�(x, y)| is 0, x and y areompletely uncorrelated.

This method allows to remove redundant information in the fea-ure dataset. A threshold value must be defined for determining theorrelation value from which features are considered redundant. In

ur study, two threshold values of 97% and 99% were adopted. Thisethod achieved an overall 75.4% dimensionality reduction in our

ataset while maintaining or improving the classification resultsbtained with the original features.

dical Imaging and Graphics 42 (2015) 25–37

4.2.3. Sequential forward searchThere are mainly three methods based on sequential search

strategies to reduce the space of features: forward, backwardand floating [40]. Sequential forward search (SFS) is a methodto reduce the feature dataset which allows to remove redundantand irrelevant features. An overall 72.56% dimensionality reduc-tion is achieved in our feature dataset. In SFS, features are selectedconsecutively. The algorithm begins with zero features. First, thealgorithm evaluates all the features and selects the one that has thebest performance. In successive stages, the algorithm does the samebut the evaluation and selection process are performed over theunselected features. The SFS algorithm ends when no improvementis obtained in the feature evaluation. There are several selectioncriteria: inter-intra group distance, nearest neighbour method, sumand minimum of estimated Mahalanobis distance, sum and mini-mum of squared Euclidean distances, etc. In our study, a nearestneighbour method was chosen.

In sequential backward search (SBS) the algorithm starts withall the features. In successive stages, the algorithm deletes from thefeature set the worse feature. In other words, the algorithm deletesthe feature that produces the smallest decrease in the value of theobjective function.

Finally, the sequential floating search method has two variants:sequential forward floating search (SFFS) and sequential backwardfloating search (SBFS). Both of them are improvements over orig-inals methods by conditional inclusion (or removal) of features.After each stage, the method returns to check if there is some bettercombination. Sequential floating search (forward and backward) istoo demanding in terms of computational time. For this reason inthis study a SFS is adopted.

5. Classification

Training and classification have been carried out with a 10-foldcross-validation (10-fold cv) method. 10-fold cv randomly dividedthe dataset into 10 disjoint subsets of approximately equal size.Then, it performs 10 iterations and in each iteration 9 of the disjointsubsets are used for training and the other subset as test set. In thisway, the validation results are an average of the iterations results.

An extensive bank of classifiers can be used to perform classi-fication although only some of them could improve the accuracyrates over our feature dataset [41]. Six techniques were compared:Fisher, support vector machine (SVM), Bagging Trees, LDA (lineardiscriminant analysis), Random Forest and AdaBoost. We selectedhere five representative one, which in turn were five of the bestclassifiers tested.

5.1. Histogram comparison based on �2

Texton histograms can be classify by histogram comparisonbased on �2. This is a numerical parameter that represents howwell two histograms match, see Eq. (9) for histograms H1(I) andH2(I) and the distance between them, d(H1, H2).

d(H1, H2) =∑ (H1(I) − H2(I))2

H1(I) − H2(I)(9)

Each histogram to be classified (test histogram) is comparedwith all histograms of the training set. Finally, the test histogrambelongs to the class of the histogram with minimum distance.

5.2. Fisher classifier

Fisher’s linear classifier finds a linear discriminant function byminimizing the errors in the least square sense [42]. This lineardiscriminant is based on finding a direction in the feature space

ed Medical Imaging and Graphics 42 (2015) 25–37 33

siat

5

msflupceebp

5

cBtetThtsBw

5

bswbTf

6

tftHto

Cnmciuea

fE

M.M. Fernández-Carrobles et al. / Computeriz

uch that the projection of the data minimizes Fisher’s criterion,.e., the ratio of the squared distance between the class means andveraged class variances. The linear classifier is then perpendicularo this projection.

.3. Support vector machine

Support vector machines (SVM) find a discriminant function byaximizing the geometrical margin between positive and negative

amples [43]. Thus, the space is mapped so that examples from dif-erent classes are separated by a gap as wide as possible. Besidesinear classification, SVMs can function as a non-linear classifier bysing the so-called kernel trick. This trick can be considered a map-ing of the inputs onto a high-dimensional feature space in whichlasses become linearly separable. SVMs minimize both trainingrror and the geometrical margin. The latter accounts for the gen-ralization abilities of the resulting classifier. SVMs are one of theest classifiers available and have been applied to many real-worldroblems.

.4. Bagging trees

Bagging (Bootstrap Aggregating) algorithm is a method oflassification which generates weak individuals classifiers usingootstrap. Each classifier is trained on a random redistribution ofhe training set so many of the original examples may be repeated inach classification [44,45]. Generally, the error of combining severalypes of classifiers is explained by bias-variance decomposition.he bias of each classifier is given by its intrinsic error and measuresow well a classifier explains the problem. Variance is given by theraining set used to create the classifier model. The total error clas-ification is given by the sum of bias and variance. In this paper theagging method was applied to classification trees and arbitrarilyere taken an ensemble of 50 trees.

.5. AdaBoost

AdaBoost was introduced by Freund and Schapire [46] and isased in training different classifiers with different training sets,uch as Bagging. The main idea of the algorithm [45], is assign aeights over the training set. Initially, all the weights are equated

ut each round the weights of misclassified examples are increased.hus, in subsequent rounds the weak classifiers will be moreocused in these examples.

. Results

Obtaining valuable classification results depends on two fac-ors: (1) a valuable feature set and (2) the classifier. In this study,eatures were extracted in two different ways, that is, using the tex-on histograms and calculating the 1st and 2nd order statistics (oraralick coefficients) on the texton maps (for frequential and spa-

ial textons). The second feature set was proposed as a optimizationf the first feature set.

Besides, each feature set was extracted for each colour space.olour models used were: RGB, CMYK, HSV, Lab, Luv, SCT and chan-el combinations: Lb and Hb. However, we really know that colourodels affect in the breast tissue images. Then, why not make a

lassification with the combination of several colour models? Thiss another improvement that this study proposes. One drawback ofsing several colour models is that the combination of colour mod-ls can increase substantially the size of the feature set. Therefore,

method to reduce the feature dimensionality is also needed.Accordingly, four types of classification experiments were per-

ormed to find the best result for the breast TMA classification.xperiment 1 and 2 carried out the classification with each colour

Fig. 11. Accuracy performance in experiment 1 with frequential textons.

model individually. On the one hand, in Experiment 1 feature setsextracted by the texton histograms were used. On the other hand,in Experiment 2, statistical features extracted by the texton mapswere used. In Experiment 3 we showed how classification improvedwith colour model combination. The experiment only used thestatistical features extracted by the texton maps. Finally, in Exper-iment 4 we proposed a feature dimensionality reduction.

In this paper the reduction of features has been performed withthree different methodologies: LDA, a correlation threshold and aSFS. In the reduction of features by correlation two threshold valueswere selected: 97% and 99%. In addition, the confusion matrix of thebest results was calculated together with quantitative evaluationbased on positive predictive value or precision (Prec) and accuracy(Acc), see Eqs. (10) and (11)

Prec = TruePositives

TruePositives + FalsePositives(10)

Acc = TruePositives + TrueNegatives

TruePositives + FalsePositives + FalseNegatives + TrueNegatives(11)

6.1. Experiment 1: Tissue classification by histograms

Tissue classifications are commonly made by texton histogramsextracted from the RGB images. As discussed above, there are othercolour models that could improve the capability of classification.The experiment 1 verifies this hypothesis. A total of 240 features(240 textons) were used for each colour model, that is, 240 (numberof textons) × 8 (colour models) × 2 (texton type: frequential andspatial). It is necessary to take into account that this experiment iscompletely independent of the textons extracted by the statisticaldescriptors using texton maps.

The best results by colour model were obtained by the Hbchannel combination with an average error of 0.29 for frequen-tial textons and an average error of 0.08 for spatial textons. Thebest result individually in this experiment was obtained using theHb channel combination and a Bagging Tree classifier for 10fcv onspatial textons.

For frequential textons, the classification reaches an averageerror of 0.248 and its confusion matrix shows an average of 74.43%precision and 87.43% accuracy. For spatial textons, the classification

reaches an average error of 0.068 and an average of 93.5% precisionand 96.55% accuracy, see Figs. 11 and 12.

34 M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37

Fig. 12. Accuracy performance in experiment 1 with spatial textons.

6a

mftAt

uswas

abaa

eftou

Fig. 14. Accuracy performance in experiment 2 with spatial textons.

Fig. 15. Accuracy performance in experiment 3 with frequential textons.

Fig. 13. Accuracy performance in experiment 2 with frequential textons.

.2. Experiment 2: Tissue classification by Haralick descriptorspplied to textons maps

Experiment 2 is an improvement of experiment 1. In experi-ent 1 there are 240 features (the number of textons). Now, the

eatures are based on the Haralick descriptors obtained from theexton maps, for both textons at frequency and spatial domains.

total of 241 features (Haralick) × 8 (colour models) × 2 (textonype: frequential and spatial) were extracted.

The colour model that works better for all classifiers is the RGB,sing frequential textons, with an average error of 0.25. Fig. 13hows these results. On the other hand, the best result individuallyas obtained with the RGB colour model and a LDA classifier with

n average error of 0.194, a 80.7% precision and a 90.2% accuracy,ee Fig. 13.

Using spatial textons, the colour space that works better withll classifiers was the Hb model with an average error of 0.175. Theest result individually was also obtained with the Hb colour modelnd a LDA classifier with an average of 0.11 error, a 89.25% precisionnd a 94.45% accuracy, see Fig. 14.

The results of these experiments showed that statistical featuresxtracted by the texton maps with frequential textons are betteror all colour models than the features extracted by the texton his-

ograms. In case of spatial texton there are cases in which the usef texton maps improves outcomes, but in other cases the resultssing histograms are better.

Fig. 16. Accuracy performance in experiment 3 with spatial textons.

6.3. Experiment 3: Tissue classification by Haralick descriptorsapplied to textons maps and colour model combination

This experiment is based on the previous experiment using tex-

ton maps with Haralick features but adding the combination ofdifferent colour models. The results of experiment 3 improve withrespect to the results of experiment 2, see Figs. 15 and 16. The best

M.M. Fernández-Carrobles et al. / Computerized Medical Imaging and Graphics 42 (2015) 25–37 35

Fig. 17. Accuracy performance in experiment 4 with frequential textons.

in exp

ruTs9

6aw

btMaot

f

wesvi7ts

Fig. 18. Accuracy performance

esult with frequential textons was given by the Bagging classifiersing all colour models with 87.9% precision and 91.5% accuracy.he best result with spatial textons was obtained with a Fisher clas-ifier combining {CMYK, Hb, Lb, HSV, Luv, SCT} colour models with4.7% precision and 97.15% accuracy.

.4. Experiment 4: Tissue classification by Haralick descriptorspplied to textons maps and colour model combination togetherith dimensionality reduction by correlation and SFS

Colour model combinations improve the classification resultsut also increases the size of the feature set. The computationalime depends on the size of the feature set and the classifier used.

oreover, a reduction of the feature dataset is needed in order tovoid the effects of the curse of dimensionality, that is, reductionf the predictive power as the dimensionality increases, known ashe Hughes effect.

Therefore, one of the main issues to be solved is to decrease theeature set dimension.

A correlation threshold and sequential forward search methodsere selected to reduce the number of features under consid-

ration. In the correlation method two threshold values wereelected: 97% and 99%. Correlation with these threshold values pro-ided a reduction of 79.2% and 71.63% of the initial features with

mprovement of the results. Using a SFS method a reduction of2.56% of the initial features was obtained. On average, the classifierhat performs best for all colour model combinations and dimen-ionality reduction is AdaBoost, see Fig. 17 and 18. This classifier

eriment 4 with spatial textons.

obtained an average error of 0.11 and 0.04 for frequential and spa-tial textons, respectively. The best colour model combination for allclassifiers is {Hb, Luv, SCT}.

Finally, an individual analysis reveals that the best result forfrequential textons was obtained with the AdaBoost classifier, acombination of the eight colour models and dimensionality reduc-tion using a correlation threshold of 97%. The classification errorwas 0.108, with 89.4% precision and 94.57% accuracy. On the otherhand, the best results for spatial textons was obtained by means ofboth the AdaBoost and Fisher classifier, with a combination of thesix colour models {RGB, Hb, Lb, HSV, Luv, SCT} and SFS. The classifi-cation error was 0.04 for AdaBoost and 0.038 for the Fisher classifier,with an average of 96.2% precision and 98.1% accuracy. However,the computational time using Fisher was lower than AdaBoost clas-sifier.

The computational time for the Fisher classifier and the applica-tion of a SFS method was 5.36 s. The SFS method reduced the featureset from 1446 to 316 features. Table 5 in Appendix shows a sortedlist of the best features kept for the different colour models after theSFS feature selection. The computational time for the same classifierand colour model combination without dimensionality reductionwas 32.5 s. The largest computational time obtained was 114.84 susing all features, combination of all colour models and BaggingTree classifier. This time was reduced to 57.36 s for the same clas-

sifier and colour models but with SFS. Adaboost took 62.9 s with acombination of the six colour models and SFS feature reduction. Alltests have been performed with the 638 images of the data set anda 10-fold cv.

3 ed Medical Imaging and Graphics 42 (2015) 25–37

7

fittascwlit1t

crttbbpis{sttnfCtt

C

a

A

SE

A

TcTbw

TF

Table 3 (Continued)

3rd Quartile �q3 =∑2�H/4�

i=�H/4�i · h(i)

Interquartile range �q3 − �q1

Minimum min(h(i))Maximum max(h(i))Range max(h(i)) − min(h(i))

Entropy∑H−1

i=0h(i) · log(h(i))

Asymmetry 1�3

∑H−1

n=0(i − �)3 · h(i)

Kurtosis 1�4

∑H−1

n=0(i − �)4 · h(i)

where Histogram h(i), bins number H, floor operator � �.

Table 4Second order statistical descriptors (Haralick).

Energy∑H−1

i=0

∑H−1

j=0p(i, j)2

Variance∑H−1

i=0

∑H−1

j=0(i − �)2p(i, j)

Contrast∑H−1

n=0n2

(∑H−1

i=0

∑H−1

j=0p(i, j)

), |i − j| = n

Dissimilarity∑H−1

i=0

∑H−1

j=0|i − j| · p(i, j)

Correlation 1�x�y

∑H−1

i=0

∑H−1

j=0i · j · p(i, j) − �x�y

Autocorrelation∑H−1

i=0

∑H−1

j=0i · j · p(i, j)

Entropy T = −∑H−1

i=0

∑H−1

j=0p(i, j) · log(p(i, j))

Measure of correlation 1 T−HXY1max(HX,HY)

Measure of correlation 2 (1 − exp [2.0 · (HXY2 − T)])1/2

Cluster shade∑H−1

i=0

∑H−1

j=0(i + j − �x − �y)3 · p(i, j)

Cluster prominence∑H−1

i=0

∑H−1

j=0(i + j − �x − �y)4 · p(i, j)

Maximum probability max(p(i, j)), i = [0, . . ., H − 1], j = [0, . . ., H − 1]

Sum average∑2(H−1)

i=0i · px+y(i)

Sum entropy∑2(H−1)

i=0px+y(i) · log(px+y(i, j))

Sum variance −∑2(H−1)

i=0(i − SumEntropy)2 · px+y(i)

Difference entropy −∑H−1

i=0px−y(i) · log(px−y(i, j))

Difference variance∑H−1

i=0i2 · p(x − y(i)

Homogeneity 1∑H−1

i=0

∑H−1

j=0p(i,j)

1+|i−j|

Homogeneity 2∑H−1

i=0

∑H−1

j=0p(i,j)

1+(i−j)2

where p(i, j) (i,j)th entry in a normalized gray-level matrix H bins number, HX andHY entropy of px and py .

�x =∑H−1

i=0

∑H−1

j=0i · p(i, j); �y =

∑H−1

i=0

∑H−1

j=0j · p(i, j); px(i) =∑H−1

j=0p(i, j); py(j) =

∑H−1

i=0p(i, j); �x =

√∑H−1

i=0px(i)(i − �x)2; �y =√∑H−1

j=0py(i)(i − �y)2; px+y(k) =

∑H−1

i=0

∑H−1

j=0p(i, j); i + j = k, k = [0, . . .,

2(H − 1)]; px−y(k) =∑H−1

i=0

∑H−1

j=0p(i, j); |i − j| = k, k = [0, . . ., H − 1]; T = HXY =

−∑N−1

i=0

∑N−1

j=0p(i, j) log(p(i, j)); HXY1 = −

∑H−1

i=0

∑H−1

j=0p(i, j) · log(px(i) · py(j));

HXY2 = −∑H−1

i=0

∑H−1

j=0px(i) · py(j) · log(px(i) · py(j)).

Table 5Sorted list of the best features kept after the SFS feature selection.

6 M.M. Fernández-Carrobles et al. / Computeriz

. Conclusions

This paper describes a complete study on breast TMA classi-cation based on the use of textons and colours. Two types ofextons: frequential textons (textons using a filter bank) and spatialextons (textons using a N × N square neighbourhood) have beennalysed. The aim is to identify an optimal set of features that besteparates different classes in breast TMA. To this end, a datasetomposed of 628 TMA images is divided into four classes: (i) stromaith cellularity, (ii) adipose tissue, (iii) benign and benign anoma-

ous structures and (iv) ductal and lobular carcinomas. Two novelmprovements have been presented in this study: (a) the introduc-ion of a suitable combination of colour models and (b) the use ofst and 2nd order Haralick statistical descriptors to characterizehe texton maps.

An extended bank of classifiers composed of six techniques wasompared, but only three of them could significantly improve accu-acy rates: Fisher, Bagging Trees and AdaBoost. Our results revealhat the combination of different colour models applied to spa-ial texton maps provides the most efficient representation of thereast TMA. We found that, on average, the best colour model com-ination is {Hb, Luv, SCT} for all classifiers and the classifier thaterforms best for all colour model combinations is AdaBoost. An

ndividual analysis showed that the best results were obtained bypatial textons using the combination of six colour models, i.e.,RGB, Hb, Lb, HSV, Luv, SCT}, and applying AdaBoost or Fisher clas-ifier. Previously, a sequential forward search method was appliedo reduce the number of features. Although the number of fea-ures were large, computational times in the classification wereot excessive due to the use of different methods to reduce the

eature set. The computational time was between 62.9 and 5.36 s.lassification yields an accuracy of 98.1% and 96.2% precision,hus making this study truly valuable in breast TMA classifica-ion.

onflict of interest statement

The authors declare that there is no conflict of interests associ-ted with the tools and datasets used in this paper.

cknowledgements

The authors acknowledge partial financial support from thepanish Research Ministry Project TIN2011-24367 and from theC Marie Curie Actions, AIDPATH project (num. 612471).

ppendix

The 1st and 2nd-order statistical features are listed inables 3 and 4, respectively. The best features kept for the differentolour models after the SFS feature selection are shown in Table 5.

he table represents only the six colour models that provide theest combination, i.e., {RGB, Hb, Lb, HSV, Luv, SCT}, and it indicatesith 1 or 0 if the feature has been kept or not.

able 3irst order statistical descriptors.

Mean � =∑H−1

i=0i · h(i)

Mode i = arg max(h(i))

Variance � =∑H−1

n=0(i − �)2 · h(i)

1st Quartile �q1 =∑H

i=3�H/4�i · h(i)

2nd Quartile �q2 =∑3�H/4�

i=2�H/4�i · h(i)

RGB HSV Luv SCT Hb Lb

Measure of correlation 2 ∀ d, ∀ ◦ 1 1 1 1 1 1Energy ∀ d, ∀ ◦ 1 1 1 1 1 1Maximum probability ∀ d, ∀ ◦ 1 1 1 1 0 1Homogeneity 1 ∀ d, ∀ ◦ 1 1 1 1 0 1Entropy (1st-order) 0 1 1 1 0 0Measure of correlation 1 d = 1,∀◦ 0 0 1 1 1 0Correlation ∀ d, ∀ ◦ 0 0 1 1 0 0Homogeneity 2 ∀ d, ∀ ◦ 0 0 1 1 0 1Difference entropy d = 1, ∀ ◦ 0 0 1 1 0 0Entropy d = 1, 3, ∀ ◦ 0 0 1 1 0 0Sum entropy d = 1, 3; ◦ = 0◦ , 45◦ , 90◦ 0 0 0 1 0 0Dissimilarity d = 1, 5; ◦ = 45◦ , 90◦ , 135◦ 0 0 0 1 0 0

where d distance = 1, 3, 5; ◦ orientation = 0◦ , 45◦ , 90◦ and 135◦ .

ed Me

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[

M.M. Fernández-Carrobles et al. / Computeriz

eferences

[1] Petrou M, García-Sevilla P. Image processing: dealing with texture. Wiley; 2006.[2] Tuceryan M, Jain AK. Texture analysis. In: Chen CH, Pau LF, Wang PSP, editors.

Handbook of pattern recognition and computer vision. 2nd ed. World ScientificPublishing Co; 1998. p. 207–48 (chapter 2.1).

[3] Haralick RM, Shanmugam K, Dinstein I. Textural features for image classifica-tion. IEEE Trans Syst Man Cybern 1973;3(6):610–21.

[4] Haralick R. Statistical and structural approaches to texture. Proc IEEE1979;67:786–804.

[5] Julesz B. Textons, the elements of texture perception, and their interactions.Nature 1981;290:91–7.

[6] Leung T, Malik J. Representing and recognizing the visual appearance of mate-rials using three-dimensional textons. Int J Comput Vis 2001;43(1):29–44.

[7] Varma M, Zisserman A. A statistical approach to material classification usingimage patch exemplars. IEEE Trans Pattern Anal Mach Intell 2009;31(11):2032–47.

[8] Guo Z, Zhang Z, Li X, Li Q, You J. Texture classification by texton: statisticalversus binary. PLoS ONE 2014;9(2):e88073.

[9] Varma M, Zisserman A. Classifying images of materials: achieving viewpointand illumination independence. In: Proceedings of the 7th European confer-ence on computer vision – Part III. 2002. p. 255–71.

10] Varma M, Zisserman A. A statistical approach to texture classification fromsingle images. Int J Comput Vis 2005;62(1–2):61–81.

11] Varma M, Zisserman A. Texture classification: are filter banks necessary?Proceedings of the IEEE computer society conference on computer vision andpattern recognition. 2003. p. 691–8.

12] Chen Z, Denton E, Zwiggelaar R. Local feature based mammographic tissuepattern modelling and breast density classification. In: 4th international con-ference on biomedical engineering and informatics (BMEI) 1. 2011. p. 351–5.

13] Bosch A, Muoz X, Oliver A, Marti J. Modeling and classifying breast tissue den-sity in mammograms. IEEE Comput Soc Conf Comput Vis Pattern Recognit2006;2:1552–8.

14] Petroudi S, Kadir T, Brady M. Automatic classification of mammographicparenchymal patterns: a statistical approach. Proceedings of the 25th annualinternational conference of the IEEE engineering in medicine and biology soci-ety, vol. 1; 2003. p. 798–801.

15] Tuzel O, Yang L, Meer P, Foran DJ. Classification of hematologic malignanciesusing texton signatures. Pattern Anal Appl 2007;10(4):277–90.

16] Khurd P, Bahlmann C, Maday P, Kamen A, Gibbs-Strauss S, Genega E, et al.Computer-aided Gleason grading of prostate cancer histopathological imagesusing texton forests. IEEE Int Symp Biomed Imag 2010:636–9.

17] Riaz F, Areia M, Silva FB, Dinis-Ribeiro M, Pimentel-Nunes P, Coimbra MT. Gabortextons for classification of gastroenterology images. In: Proceedings of the IEEEinternational symposium on biomedical imaging. 2011. p. 117–20.

18] Sadeghi M, Lee TK, McLean D, Lui H, Atkins MS. Global pattern analysis andclassification of dermoscopic images using textons. In: Haynor DR, Ourselin S,editors. Proceedings of SPIE, vol. 8314. 2012. p. 83144X.

19] Amaral T, McKenna S, Robertson K, Thompson A. Classification of breasttissue microarray spots using texton histograms. Med Image Underst Anal2008:144–8.

20] Yang L, Chen W, Meer P, Salaru G, Goodell LA, Berstis V, et al. Virtual microscopyand grid-enabled decision support for large-scale analysis of imaged pathologyspecimens. Trans Inf Tech Biomed 2009;13:636–44.

21] Xing F, Liu B, Qi X, Foran D, Yang L. Digital tissue microarray classificationusing sparse reconstruction. Proceedings of SPIE, Medical Imaging 2012: ImageProcessing, vol. 8314; 2012. p. 1–8.

22] Chekkoury A, Khurd P, Ni J, Bahlmann C, Kamen A, Patel A, et al. Automatedmalignancy detection in breast histopathological images. Proceedings of SPIE,Medical Imaging 2012: Computer-Aided Diagnosis, vol. 8315; 2012. p. 1–13.

23] Mäenpää T, Pietikäinen M. Classification with color and texture: jointly or sep-arately? Pattern Recognit 2004;37(8):1629–40.

[

[

dical Imaging and Graphics 42 (2015) 25–37 37

24] Harms H, Gunzer U, Aus HM. Combined local color and texture analysis ofstained cells. Comput Vis Graph Image Process 1986;33:364–76.

25] Tabesh A, Teverovskiy M. Tumor classification in histological images of prostateusing color texture. In: Fortieth Asilomar conference on signals, systems andcomputers (ACSSC’06). 2006. p. 841–5.

26] DiFranco M, OHurleyb G, Kayc E, Watsond R, Cunningham P. Ensemble basedsystem for whole-slide prostate cancer probability mapping using color texturefeatures. Comput Med Imaging Graph 2011;35:629–45.

27] Zhang G, Yin J, Li Z, Su X, Li G, Zhang H. Automated skin biopsy histopathologicalimage annotation using multi-instance representation and learning. BMC MedGenomics 2013;6(Suppl. 3):S10.

28] Meas-Yedid V, Glory E, Morelon E, Pinset C, Stamon G, Olivo-Marin J-C. Auto-matic color space selection for biological image segmentation. In: Proceedingsof the 17th international conference on pattern recognition (ICPR), vol. 3. IEEE;2004. p. 514–7.

29] Bueno G, Déniz O, Salido J, Milagro Fernández M, Vállez N, García-Rojo M.Colour model analysis for histopathology image processing. In: Celebi ME,Schaefer G, editors. Color medical image analysis, vol. 6 of lecture notesin computational vision and biomechanics. Netherlands: Springer; 2013.p. 165–80.

30] Fernández-Carrobles MM, Bueno G, Déniz O, Salido J, García-Rojo M. Auto-matic handling of tissue microarray cores in high-dimensional microscopyimages. IEEE J Biomed Health Inf 2013;99:1–8, http://dx.doi.org/10.1109/JBHI.2013.2282816.

31] Hammes LS, Korte JE, Tekmal RR, Naud P, Edelweiss MI, Valente PT, et al.Computer-assisted immunohistochemical analysis of cervical cancer biomark-ers using low-cost and simple software. Appl Immunohistochem Mol Morphol2007;15(4):456–62.

32] Pham N, Morrison A, Schwock J, Aviel-Ronen S, Iakovlev V, Tsao M-S, et al.Quantitative image analysis of immunohistochemical stains using a CMYK colormodel. Diagn Pathol 2007;2(8):1–10.

33] Wang D, Shi L, Wang Y-XJ, Man GC-W, Heng PA, Griffith JF, et al. Colorquantification for evaluation of stained tissues. Cytometry 2011;79(4):311–6.

34] Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B.Histopathological image analysis: a review. IEEE Rev Biomed Eng 2009;2:147–71.

35] Umbaugh SE. Computer imaging: digital image analysis and processing. BocaRaton, FL: CRC Press; 2005.

36] Jones DG, Malik J. Computational framework for determining stereo cor-respondence from a set of linear spatial filters. Image Vis Comput1992;10(10):699–708.

37] Malik J, Belongie S, Leung T, Shi J. Contour and texture analysis for imagesegmentation. Int J Comput Vis 2001;43(1):7–27.

38] McLachlan GJ. Discriminant analysis and statistical pattern recognition,Wiley series in probability and mathematical statistics. John Wiley & sons;1992.

39] Mitra P, Murphy CA, Pal SK. Unsupervised feature selection using feature sim-ilarity. IEEE Trans Pattern Anal Mach Intell 2002;24(3):301–12.

40] Pudil P, Novovicová J, Kittler J. Floating search methods in feature selection.Pattern Recognit Lett 1994;15(11):1119–25.

41] Russell SJ, Norvig P. Artificial intelligence: a modern approach. Pearson Educa-tion; 2003.

42] Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. Wiley-Interscience;2000.

43] Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20(3):273–97.44] Breiman L. Bagging predictors. Mach Learn 1996;24(2):123–40.

45] Kuncheva LI. Combining pattern classifiers: methods and algorithms. Wiley-

Interscience; 2004.46] Freund Y, Schapire RE. A decision-theoretic generalization of on-line learn-

ing and an application to boosting. In: Proceedings of the second Europeanconference on computational learning theory. 1995. p. 23–37.