View
9
Download
0
Category
Preview:
Citation preview
FINAL THESIS
Bachelor's degree in Biomedical Engineering
COMPARISON BETWEEN MACHINE LEARNING AND DEEP
LEARNING FOR THE CLASSIFICATION OF MAMMOGRAMS
IN BI-RADS
Report and Annexes
Author: Ignacio Moragues Rodríguez Director: Christian Mata Miquel Co-Director: Raul Benítez Iglesias Call: June 2021
i
Resum
Tal com apunten les estadístiques, el càncer de mama es un problema de salut greu que suposa una
considerable càrrega econòmica a l'hora de dur a terme el seu tractament, de manera que es
justifica, indubtablement, la necessitat de realitzar un cribratge d'aquesta malaltia. No obstant això,
en l'actualitat, la forma en que es realitza el diagnòstic en la pràctica clínica es propensa a errors.
D'aquesta manera, sorgeix la necessitat de buscar una eina que ajudi als professionals a classificar
les mamografies en les quatre categories de BI-RADS.
En aquest projecte es presenten dos enfocament: un de machine learning i un de deep learning.
Principalment, mes enllà de la comparació dels resultats, el que es pretén es analitzar i desgranar
en profunditat el procés seguit per aconseguir els seus respectius desenvolupaments i posterior
implementació. Així, es mostren les dificultats i desavantatges trobats a l’hora que s'avaluen i
comparen els dos models. Per a això, s'utilitzen tres bases de dades de mamografies que experts ja
han classificat seguint les pautes de BI-RADS.
Pel model de machine learning, es desenvolupen i utilitzen algoritmes que extreuen
característiques de textura de les mamografies. L’àrea densa de la mama es segmenta utilitzant la
informació obtinguda de textura i Fuzzy C-means (una tecnica de soft clustering sense supervisió).
A continuació, les àrees denses segmentades de la mama es classifiquen, utilitzant les
característiques prèviament obtingudes i seleccionades, amb l'ajuda de l'algoritme dels k-nearest
neighbors (k-NN). En aquest estudi s'especifiquen, també, les estratègies de desenvolupament al
voltant de les possibilitats que no s'han implementat en la seva totalitat explicant els motius que
van determinar la seva (parcial) exclusió. En canvi, pel model deep learning, atès que la base de
dades de les mamografies era insuficient per a l'entrenament adequat del model, s'utilitzen
tècniques de data augmentation. S'avaluen i entrenen, així, diferents arquitectures de xarxes
neuronals convolucionals (CNN).
Finalment, es presenten els resultats obtinguts i es proposa una discussió exhaustiva dels resultats,
demostrant que el model de machine learning requereix d’un gran esforç i expertesa per obtenir
uns resultats acceptables, mentre que el de deep learning mostra una precisió molt major i, per la
seva facilitat d’implementació, pot considerar-se com una eina clau per futurs treballs o
investigacions en aquesta matèria.
Report
ii
Resumen
Tal y como apuntan las estadísticas, el cáncer de mama es un problema de salud grave que supone una
considerable carga económica a la hora de llevar a cabo su tratamiento, por lo que se justifica,
indudablemente, la necesidad de realizar un cribado de esta enfermedad. Sin embargo, en la
actualidad, la forma en que se realiza el diagnóstico en la práctica clínica es propensa a errores. De este
modo, surge la necesidad de buscar una herramienta que ayude a los profesionales a clasificar las
mamografías en las cuatro categorías de BI-RADS.
En este proyecto se presentan dos enfoques: uno de machine learning y otro de deep learning.
Principalmente, más allá de la comparación de los resultados, lo que se pretende es analizar y
desgranar en profundidad el proceso seguido para conseguir sus respectivos desarrollos y posterior
implementación. Así, se muestran las dificultades y desventajas encontradas a la hora de evaluar y
comparar los dos modelos. Para ello, se utilizan tres bases de datos de mamografías que expertos ya
han clasificado siguiendo las pautas de BI-RADS.
En el caso del modelo de machine learning, se desarrollan y utilizan algoritmos que extraen
características de textura de las mamografías. El área densa de la mama se segmenta utilizando la
información obtenida de textura y Fuzzy C-means (una técnica de soft clustering sin supervisión). A
continuación, las áreas densas segmentadas de la mama se clasifican, utilizando las características
previamente obtenidas y seleccionadas, con la ayuda del algoritmo de k-nearest neighbors (k-NN). En
este estudio se especifican, también, las estrategias de desarrollo en torno a las posibilidades que no
se han implementado en su totalidad explicando los motivos que determinaron su (parcial) exclusión.
En cambio, en el modelo de deep learning, dado que la base de datos de las mamografías era
insuficiente para el entrenamiento adecuado del modelo, se utilizan técnicas de data augmentation.
Se evalúan y entrenan, así, diferentes arquitecturas de redes neuronales convolucionales (CNN).
Finalmente, se presentan los resultados obtenidos y se plantea una discusión exhaustiva de los
resultados, demostrando que el modelo de machine learning requiere de un gran esfuerzo y
experiencia para obtener unos resultados aceptables, mientras que el de deep learning muestra una
precisión mucho mayor y, debido a su fácil implementación, puede considerarse como una
herramienta clave para futuros trabajos o investigaciones en esta materia.
iii
Abstract
Epidemiological statistics portray the fact that breast cancer is a significant health concern and
economic burden, undoubtedly justifying the need for breast cancer screening. Nevertheless, how the
current diagnosis is made in clinical practice is prone to errors. Hence, there is a necessity for a tool to
assist physicians when classifying mammographies into the four categories of BI-RADS.
In this project, two approaches are presented: one based on machine learning and the other one based
on deep learning. Mainly, beyond the comparison of the results, what is intended is to analyze and
discuss in-depth the process followed to achieve their respective developments and subsequent
implementation. Thus, the difficulties and drawbacks found when evaluating and comparing the two
models are shown. Consequently, three mammography databases are used that experts have already
classified following the BI-RADS guidelines.
In the case of the machine learning model, algorithms that extract texture features from mammograms
are developed and used. The dense area of the breast is segmented, with the information obtained
from texture, using Fuzzy C-means (an unsupervised soft clustering technique). Subsequently, a feature
selection process was carried out. The classification of the dense areas was performed using a k-
nearest neighbors algorithm (k-NN). The development strategy around other possibilities that were
not fully implemented is also explained, with reference to the motives behind these decisions. On the
other hand, in the deep learning model, the mammogram database was insufficient for the adequate
training of the model. Hence, data augmentation techniques are used. Different convolutional neural
network (CNN) architectures were assessed and trained.
Finally, the results obtained are presented and an exhaustive discussion is performed, demonstrating
that the machine learning model requires great effort and experience to obtain acceptable results. In
contrast, the deep learning model shows a much higher accuracy and can be considered as key for
future work or research in this area.
Report
iv
Acknowledgments I would first like to thank my supervisor, Christian Mata Miquel, for his guidance and advice. Your
insightful feedback pushed me to sharpen my thinking. Thank you for allowing me to work on this
project.
I would like to acknowledge the support given in the deep learning implementation by the Grupo de
Investigación de Modelos de Arendizaje Computacional from the Técnologico de Monterey.
In addition, I would like to thank my family and workmates for being always there for me even though
they don't understand anything. I am also grateful for the support of my roommates. Finally, I want to
thank my life partner for his daily love and support in accompanying me on this journey.
v
Glossary
CNN: Convolutional neural network.
DL: Deep learning.
FC: Fully connected neural network.
FCM: Fuzzy C-Means.
GLCM: Grey level co-occurrence matrix.
k-NN: k-nearest neighbors algorithm, also known as KNN.
LAWS: Texture energy measures based on masks.
LBP: Local binary patterns.
ML: Machine learning.
ReLU: Rectified linear unit.
ROI: Region of interest.
SDG: Stochastic gradient descent.
SVM: Support vector machines.
Report
vi
Table of contents
RESUM ______________________________________________________________ I
RESUMEN __________________________________________________________ II
ABSTRACT __________________________________________________________ III
ACKNOWLEDGMENTS ________________________________________________ IV
GLOSSARY __________________________________________________________ V
1. INTRODUCTION _________________________________________________ 7
1.1. Cancer today ............................................................................................................ 7
1.2. Breast cancer ........................................................................................................... 7
1.3. The mammography ................................................................................................. 8
1.4. Breast Imaging Reporting and Data System (BI-RADS®) ....................................... 10
1.5. Origin of this project and motivation .................................................................... 11
1.6. Objectives .............................................................................................................. 11
2. STATE OF THE ART ______________________________________________ 13
3. PROJECT FRAMEWORK __________________________________________ 17
3.1. Texture ................................................................................................................... 17
3.1.1. Grey co-occurrence level matrix (GCLM) ............................................................. 18
3.1.2. Law's masks (LAWS) .............................................................................................. 21
3.1.3. Local binary patterns (LBP) ................................................................................... 28
3.2. Machine learning approach ................................................................................... 30
3.3. Deep learning approach ........................................................................................ 33
4. METHODOLOGY AND IMPLEMENTATION ___________________________ 39
4.1. Materials and preprocessing ................................................................................. 39
4.2. Machine learning implementation ........................................................................ 41
4.2.1. Extraction of GLCM features ................................................................................. 41
4.2.2. Extraction of LAWS features ................................................................................. 46
4.2.3. Extraction of LBP features ..................................................................................... 51
4.2.4. Creating the feature dataset ................................................................................. 53
4.2.5. Dense tissue segmentation ................................................................................... 56
4.2.6. Classification.......................................................................................................... 59
4.3. Deep learning implementation ............................................................................. 64
vii
4.3.1. VGG-16 .................................................................................................................. 64
4.3.2. Data augmentation............................................................................................... 66
4.3.3. Training and Learning curves ............................................................................... 67
4.3.4. Interpreting the model performance ................................................................... 69
5. DISCUSSION ___________________________________________________ 71
6. ENVIRONMENTAL IMPACT _______________________________________ 75
CONCLUSIONS ______________________________________________________ 77
BUDGET ___________________________________________________________ 79
Personnel cost .................................................................................................................. 79
Materials cost ................................................................................................................... 80
BIBLIOGRAPHY _____________________________________________________ 81
ANNEX A __________________________________________________________ 89
ANNEX B __________________________________________________________ 91
1
List of Figures
Figure 1.1. Human breast anatomy [6]. ______________________________________________ 8
Figure 1.2. Representation of a mammography [3]. ____________________________________ 9
Figure 1.3. Tasks planning. _______________________________________________________ 12
Figure 3.1. Mathematical representation of a digital image. _____________________________ 17
Figure 3.2. Example of the obtention of the grey co-occurrence matrix. ___________________ 19
Figure 3.3. Possible combination for the outer product. ________________________________ 21
Figure 3.4. Sample picture of the EEBE's Building C. ___________________________________ 24
Figure 3.5. Representation of a convolution [45]. _____________________________________ 25
Figure 3.6. 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5with gray colormap. ____________________________________ 25
Figure 3.7. 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5with multicolor colormap. _______________________________ 26
Figure 3.8. Average of 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5. ___________________________________________ 26
Figure 3.9. Local variance and mean from the image in Figure 3.8. _______________________ 27
Figure 3.10. Local absolute mean extracted from the image in Figure 3.8. __________________ 28
Figure 3.11. An example of how does LBP works. _____________________________________ 29
Figure 3.12. LBP examples using different radius and number of neighbors [50]. ____________ 29
Figure 3.13. LBP using a radius of 1 and 8 neighbors. __________________________________ 30
Figure 3.14. A visual example of unsupervised (left) and supervised (right) machine learning. __ 31
Figure 3.15. Predicting a new point. _______________________________________________ 31
Page. 2 Report
2
Figure 3.16. Example of infinite clusterization. _______________________________________ 32
Figure 3.17. Machine learning vs. deep learning. _____________________________________ 33
Figure 3.18. Diagram of a neuron. ________________________________________________ 34
Figure 3.19. Deep neural network [55]. ____________________________________________ 34
Figure 3.20. The rectifier function. ________________________________________________ 35
Figure 3.21. A rectified linear unit. ________________________________________________ 35
Figure 3.22. Fully connected neural network. _______________________________________ 36
Figure 3.23. Learning curves. ____________________________________________________ 37
Figure 4.1. Raw mammogram. ___________________________________________________ 40
Figure 4.2. Breast profile segmentation of two mammograms using the algorithm of [67]. ____ 40
Figure 4.3. Steps followed by GLCM_extractor.py. ____________________________________ 41
Figure 4.4. Extraction of the statistical features of the first pixel from the GLCM. ____________ 42
Figure 4.5. Last step of the GLCM_extractor.py. ______________________________________ 42
Figure 4.6. GLCM feature reduction (homogeneity). __________________________________ 45
Figure 4.7. Diagram of the feature extraction using LAWS_extractor.py (Part 1). ____________ 46
Figure 4.8. Texture image from 𝑅5𝑅5 and its histogram. ______________________________ 47
Figure 4.9. Use of a colormap to improve visualization of 𝐼𝑅5𝑅5. ________________________ 47
Figure 4.10. Last step of the LAWS_extractor.py. _____________________________________ 48
Figure 4.11. Extraction of the features using LAWS_extractor.py (Part 2). __________________ 49
3
Figure 4.12. Texture images obtained using a 15x15 window. ___________________________ 50
Figure 4.13. Steps followed by GLCM_extractor.py. ___________________________________ 51
Figure 4.14. Combination of the LBP features with different parameters. __________________ 52
Figure 4.15. Binning process of the texture images. ___________________________________ 54
Figure 4.16. Conceptual dataset. __________________________________________________ 55
Figure 4.17. Example of a selection of an ROI [29]. ____________________________________ 56
Figure 4.18. Segmentation examples through FCM. ___________________________________ 57
Figure 4.19. Result of the segmentation test with all the features. ________________________ 57
Figure 4.20. Examples of features discarded. ________________________________________ 58
Figure 4.21. Segmented artifacts. _________________________________________________ 58
Figure 4.22. Data division sizes. ___________________________________________________ 59
Figure 4.23. Heatmap of the correlation.____________________________________________ 60
Figure 4.24. Classification process. ________________________________________________ 64
Figure 4.25. The architecture of VGG-16 [83]. ________________________________________ 65
Figure 4.26. Example of data augmentation. _________________________________________ 66
Figure 4.27. AUG1 Loss vs. Epoch. _________________________________________________ 68
Figure 4.28. AUG2 Loss vs. Epoch. _________________________________________________ 68
Figure 4.29. AUG3 Loss vs. Epoch. _________________________________________________ 69
Figure 4.30. Confusion matrix for training dataset (test 1, 25 epochs). Image and Grad-CAM. __ 70
Page. 4 Report
4
Figure 5.1. Confusion matrix of the ML approach. ____________________________________ 71
Figure 5.2. Binned confusion matrix of the ML approach. ______________________________ 72
Figure 5.3. Confusion matrix of the DL approach (AUG3). ______________________________ 73
Figure 5.4. Binned Confusion Matrix of the DL approach (AUG3). ________________________ 74
Figure 0.1. Confusion matrix of the DL approach (AUG1). ______________________________ 91
Figure 0.2. Binned Confusion matrix of the DL approach (AUG1). ________________________ 91
Figure 0.3. Confusion matrix of the DL approach (AUG2). ______________________________ 92
Figure 0.4. Binned Confusion matrix of the DL approach (AUG2). ________________________ 92
5
List of tables
Table 1.1. BI-RADS categories [13]. ________________________________________________ 10
Table 2.1. Overview of the machine learning literature. ________________________________ 13
Table 2.2. Overview of the deep learning literature. ___________________________________ 14
Table 4.1. Dataset composition. __________________________________________________ 39
Table 4.2. Parameters used to extract the features on each image. _______________________ 43
Table 4.3. Breakdown of features extracted. _________________________________________ 53
Table 4.4. Breakdown of features extracted after combination. __________________________ 53
Table 4.5. Main data frame.______________________________________________________ 54
Table 4.6. Features with high correlation. ___________________________________________ 60
Table 4.7. Classified pixels of each image. ___________________________________________ 61
Table 4.8. Final classification._____________________________________________________ 63
Table 4.9. Data augmentation methods for each test. _________________________________ 67
Table 0.1. Cost for the personnel work. _____________________________________________ 79
Table 0.2. Cost for the materials used. _____________________________________________ 80
Page. 6 Report
6
List of Equations
Eq. 3.1 ______________________________________________________________________ 19
Eq. 3.2 _____________________________________________________________________ 20
Eq. 3.3 _____________________________________________________________________ 20
Eq. 3.4 _____________________________________________________________________ 20
Eq. 3.5 _____________________________________________________________________ 20
Eq. 3.6 _____________________________________________________________________ 20
Eq. 3.7 _____________________________________________________________________ 20
Eq. 3.8 _____________________________________________________________________ 28
Eq. 3.9 _____________________________________________________________________ 28
Eq. 3.10 ____________________________________________________________________ 30
Eq. 4.1 _____________________________________________________________________ 42
Eq. 4.2 _____________________________________________________________________ 50
Eq. 4.3 _____________________________________________________________________ 52
Eq. 4.4 _____________________________________________________________________ 55
Eq. 4.5 _____________________________________________________________________ 62
Eq. 6.1 _____________________________________________________________________ 75
Eq. 6.2 _____________________________________________________________________ 75
7
1. Introduction
1.1. Cancer today
Worldwide, more than 19 million new cancer cases and almost 10 million cancer deaths occurred in
2020. Female breast cancer has surpassed lung cancer as the most frequently diagnosed cancer, with
2.3 million new cases (11.7%), followed by lung (11.4%), colorectal (10.0 %), prostate (7.3%), and
stomach (5.6%) cancers. In women, breast cancer is the most diagnosed cancer and the leading cause
of cancer death [1].
It has been revealed that thanks to the breast cancer screening performed in the current clinical
practice, mortality from this disease has significantly decreased when done at the age of over 50, which
is the one with the highest incidence [2]. In medicine, screening is looking for signs of a disease, such
as breast cancer, before somebody has signs of it. The goal line of screening tests is to find cancer at
an early phase when it can be treated and might be cured. Occasionally, a screening test finds cancer
that is very small or very slow-growing [3]. These cancers are not likely to cause illness or death during
a person's lifetime. Therefore, screening often leads to overdiagnosis. Essentially, overdiagnosis is
turning healthy women into patients that otherwise might not have become clinically apparent.
Overdiagnosed cancers remain asymptomatic throughout a woman's life [4]. However, screening is
needed and essential since breast cancer is a significant public health concern with considerable
medical and economic burden.
In Spain, it is estimated that early detection of cancer could reduce the total costs by around 9,000
million euros. Furthermore, on average, metastatic breast cancer costs almost 4 times more than
cancer detected in an early stage. The expenses of metastatic breast cancer can exceed 200,000 euros
per patient [2]. In metastasis, cancer cells break away from where they first formed (primary cancer),
travel through the blood or lymph system, and develop new tumors (metastatic tumors) in other parts
of the body. Many cancer deaths begin when cancer moves from the original tumor and spreads to
other tissues and organs, colonizing them [5].
1.2. Breast cancer
Breast cancer is a common disease in which cells in the breast begin to multiply and grow
uncontrollably. A breast has three main parts: lobules, ducts, and connective tissue. Ducts have the
Report
8
function of collecting and transporting milk, which is produced by the lobules. Altogether is surrounded
and held by connective tissue made primarily by fibrous and fatty tissue:
Figure 1.1. Human breast anatomy [6].
There exist different kinds of cancer depending on which cells in the breast have become cancerous.
In most cases, cancer begins in the lobules or ducts. These cancerous cells can spread through blood
and lymph vessels to other parts of the body (metastasis) [6].
As already mentioned, screening mammography is necessary because it has demonstrated that it
significantly reduces breast cancer mortality [7] and positively affects the economic cost of healthcare.
Nevertheless, after 30 years of mammography screening, advanced and metastatic breast cancer
incidence rates have remained stable [8].
1.3. The mammography
During a radiographic test, an X-ray beam passes through a body part. The internal structures absorb
these X-rays at different rates, and the remaining X-ray pattern hits a detector. The recording of this
radiation can be done using film that reacts and is sensitive to X-rays or using electronic sensors.
A mammogram or mammography is essentially an X-ray image of the breast. They are an advantageous
technique for looking for early signs of breast cancer before it can be felt (up to 3 years) [9]. Hence, it
is a crucial test in breast cancer screening.
9
In a mammogram, the breast is pressed between two plates. Then, X-rays are used to take pictures of
breast tissue, as can be seen in Figure 1.2:
Figure 1.2. Representation of a mammography [3].
Few handicaps affect the variability of the resulting mammographic picture. For example, there is much
variation in the breast's glandularity, which disturbs the radiographic density and appearance of the
mammogram. In general, breast glandularity decreases with increasing breast sizes, but again there
can be significant differences. The breast has to be compressed during mammography (Figure 1.2),
and the compressed thickness may vary from 20 mm to more than 110 mm. This variation in breast
composition and thickness leads to a significant challenge to the X-ray imaging system, which must
achieve adequate quality at a low dose for a wide range of conditions. Breast abnormalities may appear
on the mammogram as a soft tissue lesion that may be rounded or spiculated. However, sometimes
the only sign of an anomaly is one or more calcifications or distortion in the breast architecture.
Calcifications are crumbs of calcium hydroxyapatite or phosphate, ranging from extremely small to
several millimeters. It is considered desirable to detect calcifications as small as 100 µm, which presents
a significant challenge to the imaging system [10].
Report
10
Human readers evaluate screening mammograms. The reading process is monotonous, tiring, lengthy,
costly, and, most importantly, prone to errors. Multiple studies have shown that up to 30% of
diagnosed cancers could be found retrospectively on the previous negative screening exam by blinded
reviewers [11].
1.4. Breast Imaging Reporting and Data System (BI-RADS®)
Aiming to reduce the discordance in interpreting mammographic findings and homogenizing the terms
for characterization and reporting in a standardized way, the American College of Radiology published,
in 1993, the Breast Imaging Reporting and Data (BI-RADS®) [12].
This structured system aims to achieve consistency and reliability between different reports and
facilitates clear communication between the radiologist and other medical professionals by providing
a lexicon of descriptors. It is a reporting structure that relates assessment categories to management
recommendations and a framework for data collection and auditing. The BI-RADS lexicon classifies
breast imaging findings into different types:
Table 1.1. BI-RADS categories [13].
BI-RADS 1
No finding is present in the imaging modality (not even a benign finding).
Symmetrical and no masses, no architectural distortion, nor suspicious
calcifications
BI-RADS 2
A finding in this category has a 100% chance of being benign. Even though
BI-RADS 1 and BI-RADS 2 represent an essentially zero probability of being
malignant. BI-RADS 1 is used when the breast is unremarkable; BI-RADS 2
is used when the radiologist wants to highlight a benign finding.
BI-RADS 3 A finding is probably benign, with a shallow risk of malignancy between 0%
and 2%. The density of the breast is higher than the previous categories.
BI-RADS 4
Suspicious abnormality. Lesions may not have the typical morphology of
breast cancer. However, there is a high chance of malignancy. In these
cases, a biopsy is recommended. The breast is very dense.
11
There are more categories apart from the ones in Table 1.1. For instance, 0 indicates that additional
mammograms should be taken since no conclusions can be extracted (moved or wrong taken). BI-RADS
category 5 indicates a higher chance of malignancy, and BI-RADS category 6 represents a biopsy-proven
malignancy. Hence, only the four levels in the table above are considered in this project.
1.5. Origin of this project and motivation
The origin of this project can be traced back to previous work from the MSc thesis of Christian Mata
[14], the project's supervisor. Hence, this study pretends to be a continuity and improvement of his job
with an updated literature review and an autonomous learning voyage from my side. It should be
mentioned that only one general subject in the Bachelor's degree in Biomedical Engineering introduces
the Python language [14], and only one specific subject presents image processing.
The previous project was developed on Matlab [15] and only used machine learning techniques to
classify mammograms in the BI-RADS categories using texture descriptors. According to the further
works related to the previous version, the development requires implementing new strategies,
improving the performance, and optimize the computational time. For this reason, an exhaustive study
of previously published works will be depicted in section 2. It is a crucial step to construct and justify
our purpose and methodology developed in this project.
My motivation to continue this project increased after reading the previous works and considering the
suggestions; I accepted to do my final bachelor’s project in this research line. It is important to remark
that even though I had very little knowledge and experience in this topic before starting the project, I
desired to delve deeper into it as I consider it the key to advance and progress in healthcare. The main
objectives and planning schedule chart are detailed in the following section.
1.6. Objectives
The main objective of this BSc thesis is to assess the most suitable way, between machine learning or
deep learning, to classify digital mammograms within the 4 categories of the BI-RADS scale shown in
Table 1.1. The motivation of this objective is born under the following facts, already introduced in
section 1.1:
• It is the most diagnosed cancer worldwide.
• It is the first cause of cancer death in women.
• Its early and accurate diagnosis would save many lives and reduce healthcare expenses.
• Up to 30% of diagnosed breast cancers could be found retrospectively on the previous
negative screening exam by blinded reviewers.
Report
12
Therefore, in summary, breast cancer is a substantial public health concern with a significant medical
and economic burden. Furthermore, as already stated, the current diagnosis is error-prone. Thus, a
justified need to develop a computer-aided diagnosis (CAD) system capable of assisting physicians in
mammograms' classification exists. This project pretends to find the best pipeline to follow for
developing this CAD tool. Therefore, the steps and aims of the herein project are itemized as follows:
• Review state of the art on mammogram classifications.
• Conduct a literature review to obtain a basis and fundamentals on the topic and classification
methods in the literature to study them.
• Implement all codes in Python.
• Compare the machine learning and deep learning approaches.
• Discuss and find further improvements connected to this research work.
The following figures aim to draw a roadmap of the project to organize the time and tasks that need
to be treated. It is the final version as it was modified during the project:
February March April May June
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
First meetings with the tutor
Planification
Introduction to the topic
Delve deeper into Python
Bibliographic research
Implementation of the texture extractors
Feature extraction
Store information in the dataset
Dense segmentation
Model selection and classification
Meeting: revision and improvements
Writing
First contact with deep learning
Literature review
Meeting with ITM
Implementation of DL to data set
Results obtention
Discussion
Writing
Final review
Figure 1.3. Tasks planning.
13
2. State of the art
During the last years, different approaches have been proposed to deal with the classification of
radiological breast images. These works exploit both machine learning and deep learning techniques.
However, not many of them attempt to classify in the BI-RADS scale. In fact, the vast majority of works
are focused on finding regions of interest, the likelihood of being cancerous, or simply whether
mammograms are malignant or not. Both machine learning and deep learning are extensively
described in sections 3 and 4.
Many projects based on machine learning use texture (further explained in section 3.1) as a feature
extraction method. Having features that describe the data is crucial for classification since it is the input
the different machine learning models require. Table 2.1 gathers the previous works that have
addressed classifying mammograms using texture and machine learning. The most employed texture
algorithms are LBP, GLCM, and LAWS:
Table 2.1. Overview of the machine learning literature.
Reference LBP GLCM LAWS
Pereira et al. 2014 [16] ✔
Mata et al. 2008 [17] ✔ ✔ ✔
Rabidas et al. 2016 [18] ✔
Sonar et al. 2018 [19] ✔
Mohanty et al. 2011 [20] ✔
Sadad et al. 2018 [21] ✔ ✔
Gardezi et al. 2015 [22] ✔
Phadke et al. 2016 [23] ✔ ✔
Wang et al. 2017 [24] ✔
Manduca et al. 2009 [25] ✔
Nithya et al. 2017 [26] ✔ ✔
Kriti et al. 2015 [27] ✔
Farhan et al. 2020 [28] ✔ ✔
Report
14
On the other hand, even though it is not mandatory to manually extract features prior to classification
in deep learning, it has to be mentioned that texture extraction steps are sometimes included in the
pipeline of some studies (Setiawan et al. 2015 [29] and Gastounioti et al. 2018 [30]). These examples
and the ones depicted in Table 2.1 establish that texture is useful and currently employed in deep
learning and, especially, in machine learning. Nevertheless, in projects where deep learning is
implemented, the images are directly used to train the models in most cases. Table 2.2 gathers the
previous works that have addressed classifying mammograms using deep learning:
Table 2.2. Overview of the deep learning literature.
Reference Approach
Setiawan et al. 2015 [29] Mammogram classification using Law's texture
energy measure.
Gastounioti et al. 2018 [30] Breast patterns finder associated with breast
cancer risk.
Jadoon et al. 2017 [31] Three-class mammogram classification based on
descriptive CNN features.
Arora et al. 2020 [32] Benign and malignant classification.
Altan et al. 2020 [33] Three-class mammogram classification.
Suh et al. 2020 [34] Cancer detection in mammograms of various
densities.
Shen et al. 2019 [35] Classification of patches in benign or malignant
calcification or masses.
Mohamed et al. 2018 [36] Breast density three-class mammogram classifier.
Wang et al. 2016 [37] Identifying metastatic breast cancer.
Adedigba et al. 2019 [38] Deep learning-based classifier for small dataset.
15
As seen in Table 2.1 and Table 2.2, there is a wide variety of studies in the field of mammogram
classification. However, they rely on either machine learning or deep learning. Hence, even though
some utilize deep learning for feature extraction and the final classification is done through machine
learning [31], no comparative revisions between both for this specific topic are found.
Therefore, the novelty and uniqueness of this project is the direct comparison between two different
approaches of machine learning and deep learning for mammogram classification in BI-RADS. After this
revision, the use of texture techniques such as GLCM, Law's masks, and LBP will be implemented
following and improving the steps of previous work [39] using Python. Moreover, the use of deep
learning algorithms could improve the traditional machine learning approaches. It is important to
remark that different approaches exist based on deep learning. In this sense, some of them will be
explored in order to choose the one that will eventually be implemented using our mammography
database. The strategies and the methodology used in this project will be explained in the following
sections.
17
3. Project framework
Different methodologies have been explored and addressed. Some of them are mentioned and
justified in this section, especially in section 4. However, for convenience, only the finalized versions of
the two developed approaches are fully described. The development strategy around other
possibilities that were not fully implemented is also explained, with reference to the motives behind
these decisions. This section gives an overview of concepts that will be utilized in the methodology.
3.1. Texture
As seen in the state of the art and literature reviewed, texture analysis and extraction have significant
importance in computer vision and especially in machine learning. Therefore, texture feature
extraction is considered a fundamental step to classify mammograms in the machine learning
approach. Hence, continuing in the same line of the previous work introduced in section 1.5.
The sense of touch allows living organisms to perceive qualities of objects such as pressure,
temperature, texture, and hardness. The skin has different receptors that transform the stimuli into
information that the brain can interpret [40]. Therefore, tactile texture refers to the tangible feel of a
surface; humans can also visually identify textures. Thus, visual texture can be defined as seeing shapes
or contents in an object and associate them with a tactile texture. However, in the computer vision
domain, identifying textures can be complex and challenging. As it can be seen in the figure below
(Figure 3.1), the mathematical representation of a digital 2D image consists of a matrix array in which
each position of the matrix represents the value of the pixel intensity:
Figure 3.1. Mathematical representation of a digital image.
Report
18
In an 8-bit-grayscale standard image, the maximum value (white) is 255, while the minimum (black) is
0. In between, the remaining integers correspond to the shades of grey between black and white. The
concept is the same for color images except that each pixel has three components corresponding to
the red, green, and blue intensity.
Therefore, starting from the base that digital images are essentially matrix arrays, texture can be
defined, in image processing, as the spatial variation of the pixels' intensity. Texture analysis has shown
an important role in computer vision, such as object recognition, surface defect detection, pattern
recognition, and medical image analysis. Evaluating the intensity of the pixels' distribution and
dispersion characteristics such as smoothness, coarseness, and regularity in multiple directions can
help in the diagnose of certain diseases [41].
The texture detection methods are usually classified into four types: statistical methods, structural,
model-based, and transform-based methods. However, many methods that have been developed
cannot be classified in only one class since they are considered combinational methods [41].
The statistical methods perform a series of calculations on the lightness intensity distribution functions
of pixels. Two types of levels of statistical characteristics can be identified:
• First level: single pixel specification is calculated without taking into account the interaction
between the other pixels of the image
• Second and higher levels: The specification of a particular pixel is calculated considering two
or more pixels' dependence.
When classifying anything, two things must be known beforehand: the classes and the features that
will be extracted. For example, if we wanted to classify humans into two classes (healthy and
unhealthy), we could ask them details such as age, weight, height, and the number of times they have
been hospitalized in the last years. These are called features and are characteristics that describe each
human. These descriptors must be wisely chosen so that they can discriminate the data into the classes.
3.1.1. Grey co-occurrence level matrix (GCLM)
One of the oldest statistical methods for extracting texture features is the co-occurrence matrix
introduced by Haralick in 1973 [42]. The grey co-occurrence matrix (GLCM) of an image is created
based on the correlations between image pixels. Therefore, it is considered a second-level statistical
characteristic.
The GLCM is defined over an image as the distribution of co-occurring pixel values at a given offset.
The offset is a position operator that indicates the directions when computing the co-occurrence
matrix. For instance, an offset [2, 1] means looking at one pixel down and two pixels right on each step.
19
Moreover, if an image has p different pixel values, its co-occurrence matrix will be p x p for the given
offset [43]. In a standard 8-byte image, a 256 x 256 co-occurrence matrix is obtained.
The 𝐶Δx,Δy(𝑖, 𝑗) value of a GLCM gives the number of times in the image that 𝑖 and 𝑗 pixel values occur
in the relation conveyed by the offset. Therefore, depending on the offset, the matrix will change.
To sum up, the co-occurrence matrix can be parameterized as:
𝐶Δx,Δy(𝑖, 𝑗) = ∑ ∑ {
1, if 𝐼(𝑥, 𝑦) = 𝑖 and 𝐼(𝑥 + Δx, y + Δy) = j 0, otherwise
𝑚
𝑦=1
𝑛
𝑥=1
Eq. 3.1
(Δx, Δy) is the offset
𝐼(𝑥, 𝑦) is the pixel value at the position (𝑥, 𝑦)
The co-occurrence matrix can also be parametrized in terms of distance (𝑑) and angle (𝜃) instead of
offset, as seen in the example below (Figure 3.2). The co-occurrence matrix, in this case, is computed
with an angle of 0 degrees and a distance of 1:
Figure 3.2. Example of the obtention of the grey co-occurrence matrix.
Report
20
In the example above (Figure 3.2), an image of only 4 levels of intensity is presented. Hence, the
resulting co-occurrence matrix is a 4x4 matrix. As previously mentioned, a 256x256 matrix is obtained
in an 8-byte image.
How many times a pair of pixels repeat could seem to be unusable information. Nevertheless, it is
possible to extract valuable data through some statistical operations. These operations give the called
Haralick features [42]. These features and the co-occurrence matrix can easily be obtained using the
scikit-image open-source image processing library for Python [44]. The module "greycoprops" of this
library is based on the Haralick features. It extracts features of a given grey-level co-occurrence matrix
to serve as a compact summary of the matrix. The properties are computed as follows, where the
parameter 𝑃𝑖,𝑗 is grey-level co-occurrence:
1) Contrast = ∑ 𝑃𝑖,𝑗(𝑖 − 𝑗)2𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1
𝑖,𝑗=0 Eq. 3.2
2) Dissimilarity = ∑ 𝑃𝑖,𝑗|𝑖 − 𝑗|𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1
𝑖,𝑗=0 Eq. 3.3
3) Homogeneity = ∑𝑃𝑖,𝑗
1 + (𝑖 + 𝑗)2
𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1
𝑖,𝑗=0 Eq. 3.4
4) ASM = ∑ 𝑃𝑖,𝑗 2
𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1
𝑖,𝑗=0 Eq. 3.5
5) Energy = √𝐴𝑆𝑀 Eq. 3.6
6) Correlation
= ∑ 𝑃𝑖,𝑗
[ ( 𝑖 − 𝜇𝑖)( 𝑗 − 𝜇𝑗)
√(𝜎𝑖2)(𝜎𝑗
2)] 𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1
𝑖,𝑗=0
Eq. 3.7
where 𝜇𝑖 , 𝜇𝑗 , 𝜎𝑖 𝑎𝑛𝑑 𝜎𝑗 are the means and
standard deviations of 𝑃𝑖 and 𝑃𝑗
21
3.1.2. Law's masks (LAWS)
One of the most relevant techniques used for extracting information from the textures of an image is
the Laws' masks. These masks are a group of predefined kernels proven to extract relevant texture
features effectively and without a high computational cost since it is a simple convolution between the
image and the mask. The different types of Laws' masks are used for: level detection (L), edge detection
(E), spot detection (S), ripple detection (R), and wave detection (W). Each mask gives the user different
image data, making them more suitable depending on the application. The Laws' masks used in this
project are the followings:
L5 = [ 1 4 6 4 1]
E5 = [−1 −2 0 2 1]
S5 = [−1 0 2 0 −1]
R5 = [−1 −4 6 −4 1]
As mentioned, if the outer product is computed between them, 5x5 masks are obtained:
𝑳𝟓𝑬𝟓 = L5 ⊗ E5 = L5 𝐓 E5 =
[ 1 4 6 4 1
]
[−1 − 2 0 2 1 ] =
[ −1 −2 0 2 1−4 −8 0 8 4−6 −12 0 12 6−4 −8 0 8 4−1 −2 0 2 1
]
The possible combinations of the give a total of 16 masks:
𝑳𝟓 𝑬𝟓 𝑺𝟓 𝑹𝟓
𝑳𝟓 L5L5 E5L5 S5L5 R5L5
𝑬𝟓 L5E5 E5E5 S5E5 R5E5
𝑺𝟓 L5S5 E5S5 S5S5 R5R5
𝑹𝟓 L5R5 E5R5 S5R5 R5R5
Figure 3.3. Possible combination for the outer product.
Report
22
𝐿5𝐿5 =
[ 1 4 6 4 1 4 16 24 16 4 6 24 36 24 6 4 16 24 16 4 1 4 6 4 1 ]
𝐿5𝐸5 =
[ −1 −2 0 2 1 −4 −8 0 8 4 −6 −12 0 12 6 −4 −8 0 8 4 −1 −2 0 2 1 ]
𝐿5𝑆5 =
[ −1 0 2 0 −1 −4 0 8 0 −4 −6 0 12 0 −6 −4 0 8 0 −4 −1 0 2 0 −1 ]
𝐿5𝑅5 =
[ 1 −4 6 −4 1 4 −16 24 −16 4 6 −24 36 −24 6 4 −16 24 −16 4 1 −4 6 −4 1 ]
𝐸5𝐿5 =
[ −1 −4 −6 −4 −1 −2 −8 −12 −8 −2 0 0 0 0 0 2 8 12 8 2 1 4 6 4 1 ]
𝐸5𝐸5 =
[ 1 2 0 −2 −1 2 4 0 −4 −2 0 0 0 0 0 −2 −4 0 4 2 −1 −2 0 2 1 ]
𝐸5𝑆5 =
[ 1 0 −2 0 1 2 0 −4 0 2 0 0 0 0 0 −2 0 4 0 −2 −1 0 2 0 −1 ]
𝐸5𝑅5 =
[ −1 4 −6 4 −1 −2 8 −12 8 −2 0 0 0 0 0 2 −8 12 −8 2 1 −4 6 −4 1 ]
𝑆5𝐿5 =
[ −1 −4 −6 −4 −1 0 0 0 0 0 2 8 12 8 2 0 0 0 0 0 −1 −4 −6 −4 −1 ]
𝑆5𝐸5 =
[ 1 2 0 −2 −1 0 0 0 0 0 −2 −4 0 4 2 0 0 0 0 0 1 2 0 −2 −1 ]
23
𝑆5𝑆5 =
[ 1 0 −2 0 1 0 0 0 0 0 −2 0 4 0 −2 0 0 0 0 0 1 0 −2 0 1 ]
𝑆5𝑅5 =
[ −1 4 −6 4 −1 0 0 0 0 0 2 −8 12 −8 2 0 0 0 0 0 −1 4 −6 4 −1 ]
𝑅5𝐿5 =
[ 1 4 6 4 1 −4 −16 −24 −16 −4 6 24 36 24 6 −4 −16 −24 −16 −4 1 4 6 4 1 ]
𝑅5𝐸5 =
[ −1 −2 0 2 1 4 8 0 −8 −4 −6 −12 0 12 6 4 8 0 −8 −4 −1 −2 0 2 1 ]
𝑅5𝑆5 =
[ −1 0 2 0 −1 4 0 −8 0 4 −6 0 12 0 −6 4 0 −8 0 4 −1 0 2 0 −1 ]
𝑅5𝑅5 =
[ 1 −4 6 −4 1 −4 16 −24 16 −4 6 −24 36 −24 6 −4 16 −24 16 −4 1 −4 6 −4 1 ]
Except the 𝐿5𝐿5, the sum of all the values within each 2D mask is equal to zero. Hence, 𝐿5𝐿5 is
sometimes excluded and is not used when extracting texture information [29].
Another interesting fact about these masks is that, as commented before, they are very useful in
reacting to a particular image's pixel distribution. For instance, 𝐸5𝐿5 measures horizontal edge content
while 𝐿5𝐸5 can measure vertical edge contents. This fact can be intuited from how the masks are:
𝐿5𝐸5 =
[ −1 −2 0 2 1 −4 −8 0 8 4 −6 −12 0 12 6 −4 −8 0 8 4 −1 −2 0 2 1 ]
𝐸5𝐿5 =
[ −1 −4 −6 −4 −1 −2 −8 −12 −8 −2 0 0 0 0 0 2 8 12 8 2 1 4 6 4 1 ]
As a visual example, an image with very marked horizontal and vertical lines, such as the Building C
from the Barcelona East School of Engineering (EEBE), is convoluted with the two kernels above.
Report
24
Figure 3.4. Sample picture of the EEBE's Building C.
A convolution is simply the process of taking a small matrix (kernel or mask) and sliding it over all the
image’s pixels. For each position, thus each pixel, the mutually overlapping pixels product and their
sum are calculated.
Frequently, padding is added to the original image to avoid ending up with a smaller size image. The
result will be the value of the output pixel at that particular location as seen in the conceptual
representation in Figure 3.5.
25
Figure 3.5. Representation of a convolution [45].
The resulting convolutions of Figure 3.4 are IE5L5and IL5E5
:
Figure 3.6. 𝐼𝐸5𝐿5
and 𝐼𝐿5𝐸5with gray colormap.
In Figure 3.6, it can be seen that the horizontal and vertical parts of the image result, as expected,
present higher values when filtered by these masks.
Report
26
Since the human eye can perceive no more than 900 levels of gray [46], a colormap can be applied to
distinguish better:
Figure 3.7. 𝐼𝐸5𝐿5
and 𝐼𝐿5𝐸5with multicolor colormap.
The average of the two resulting images becomes the total edge content:
Figure 3.8. Average of 𝐼𝐸5𝐿5
and 𝐼𝐿5𝐸5.
27
The same idea is used with the resulting images. Hence, starting with one inputted image, 9 images are
obtained:
𝐼𝐿5𝐸5+ 𝐼𝐸5𝐿5
2
𝐼𝐿5𝑆5+ 𝐼𝑆5𝐿5
2
𝐼𝐿5𝑅5+ 𝐼𝑅5𝐿5
2
𝐼𝐸5𝑆5+ 𝐼𝑆5𝐸5
2
𝐼𝐸5𝑅5+ 𝐼𝑅5𝐸5
2
𝐼𝑅5𝑆5+ 𝐼𝑆5𝑅5
2
𝐼𝑆5𝑆5 𝐼𝐸5𝐸5
𝐼𝑅5𝑅5
As done with 3-channels color images (RGB), the 9 resulting images can be considered as a single image
with 9 texture features for each pixel.
Similar to the GLCM feature extraction method, a 𝑊size window is taken for each pixel. Statistical
features such as mean, absolute mean, and variance are extracted between the center pixel and the
neighbors in that window. These are the features used in classification. Their obtention is
computationally heavy. Find below the resulting features:
Figure 3.9. Local variance and mean from the image in Figure 3.8.
Report
28
Figure 3.10. Local absolute mean extracted from the image in Figure 3.8.
3.1.3. Local binary patterns (LBP)
Local binary patterns (LBP), first described in 1940 [4], are a visual descriptor used for classification in
computer vision. It describes the grayscale local texture of the image with low computational
complexity by detecting local patterns between adjacent pixels [47]. Its unique capability to ignore
uniform variations in the images (e.g., the lightning) and its low computational cost makes this
technique very advantageous in many applications, for example, in face detection [48] and medical
images [47].
The implementation of the algorithm is based, basically, on a sliding window that compares its center
value with neighbors' values in a specific radius or distance. The number of neighbors is also decided,
allowing interpolating the numbers between them [49]. For each comparison, if the centric value is
bigger or equal to the neighbor, 1 is assigned in that position; otherwise, a 0 is allocated. The following
expressions describe the LBP, where 𝑠(𝑥) is the threshold function, and 𝑔𝑐 and 𝑔𝑝 represent the grey-
scale value of the center pixel and the pth neighbor, respectively:
𝑠(𝑥) = {
1, if 𝑥 ≥ 0 0, otherwise
Eq. 3.8
𝐿𝐵𝑃 = ∑ 𝑠(𝑔𝑝 − 𝑔𝑐) · 2𝑝
𝑃−1
𝑝=0
Eq. 3.9
29
Afterward, all the ones and zeros are concatenated, creating a binary number that is converted into an
LBP value after converting it into a decimal value. For example, it is shown in the following illustration:
Figure 3.11. An example of how does LBP works.
As already mentioned, the radius and number of neighbors can also be chosen. In the diagram below,
the red dots represent the central pixel (𝑔𝑐) and the green dots represent the neighbors' pixels (𝑔𝑝):
Figure 3.12. LBP examples using different radius and number of neighbors [50].
When the LBP value is calculated, the window slides to the next position until the image is fully
processed, sometimes it is interesting to add padding to the picture before applying the LBP. Hence,
the resulting image is not smaller than the original one.
There are two types of patterns depending on the number of transitions between 0 and 1 and vice
versa. If there are two or fewer transitions, it is called a uniform pattern. Otherwise, it is a non-uniform
pattern. Here goes an example:
11111011 → Uniform (2 transitions)
10100011 → Non-uniform (4 transitions)
In practice, it is very infrequent to find non-uniform patterns. For that reason, there are 58 possible
LBP values defined for uniform patterns. For all the non-uniform patterns, there is only 1 LBP value
assigned to them. Hence, the possible combinations can be reduced from 256 to 59 characteristics in
an 8-bit grayscale image using this technique.
Report
30
Furthermore, it is possible to extract rotationally invariant Local binary patterns. This method
introduces the fact that the same patterns are not affected by the orientation and are invariant over
rotations. This is achieved by shifting the pattern until finding the minimum value, which will be the
LBP. With this approach, there are only 36 features in an 8-bit grayscale image. In addition to this,
grayscale invariance is also achievable if the center gray value is subtracted from the neighbors [51].
Therefore, reducing the number of possible local binary patterns by only looking at the intensity
variance and considering the central pixel as the offset. This idea is represented in the figure and
expression below:
Figure 3.13. LBP using a radius of 1 and 8 neighbors.
If the LBP grayscale invariance is searched in Figure 3.13, the expression used for that would be:
𝐿𝐵𝑃 = [𝑔0 − 𝑔𝐶 , 𝑔1 − 𝑔𝐶 , 𝑔2 − 𝑔𝐶 , 𝑔3 − 𝑔𝐶 , 𝑔4 − 𝑔𝐶 , 𝑔5 − 𝑔𝐶 , 𝑔6 − 𝑔𝐶 , 𝑔7 − 𝑔𝐶] Eq. 3.10
Therefore, LBP is a powerful extraction method that is not as dependent as others on pixel intensity or
rotation.
3.2. Machine learning approach
Machine learning is one of the applications of artificial intelligence (AI) that provides systems with the
ability, without being explicitly programmed, to improve and learn from experience. The learning
process begins by fitting a model with training data. Then, the model will learn from it and make
decisions or predictions of new data in the future based on the data provided. Machine learning is
widely used in various computing tasks where designing and programming explicit algorithms with
good performance is complex or unfeasible. For example, some of its applications include email
filtering, network intrusion detection, product recommendations, speech recognition, optical
character recognition (OCR), and, most importantly, computer vision and medical diagnosis [52].
Machine learning algorithms can be classified as supervised or unsupervised:
31
• Supervised machine learning uses what has been learned in the past to new data, predicting
events in the future. Thanks to a known training dataset, the algorithm can produce an inferred
function to make predictions. A model can provide targets for any new input after the training.
• On the other hand, unsupervised machine learning is used when the training dataset is neither
labeled nor classified. Unsupervised learning algorithms study how systems can infer a
function to describe a hidden structure from unlabeled data. Clustering is the most commonly
used unsupervised learning technique. Clustering refers to the process of automatically
grouping data points with similar characteristics and assigning them to "clusters".
To better explain these two types of machine learning, two conceptual plots of each class can be found
below. Each axis is any feature that describes a point (energy consumption, height, density, age, etc.):
Figure 3.14. A visual example of unsupervised (left) and supervised (right) machine learning.
In supervised learning, it is possible to check whether
data clusters or groups coincide with the actual class
of the data. Hence, in the right plot of Figure 3.14, it
can be seen that "A" and "B" are discriminable and do
not overlap. Therefore, when giving a new point to this
model, its class can be predicted by, for instance,
checking the class of its closest point in the training
dataset, as depicted in the right:
Figure 3.15. Predicting a new point.
Report
32
In Figure 3.15, using the closest neighbor criteria, the new point would be classified as "A". On the
contrary, in the left plot of Figure 3.14 (unsupervised learning), it is clear that there are two clusters;
nevertheless, some concerns might arise. For example, points on the Cluster 1 may be in Cluster 2
and vice versa. Hence, one handicap of unsupervised learning is to know if the features selected
can adequately discriminate the data as having two clusters does not necessarily mean that.
Moreover, the number of clusters should be known beforehand. Otherwise, criteria to choose the
number of clusters must be considered since data can be grouped in many ways. For example,
taking the example of unsupervised learning, the second cluster can be divided not only once but
many times:
Figure 3.16. Example of infinite clusterization.
Therefore, unsupervised learning is a powerful tool to separate and discriminate data by looking at its
features in two dimensions and many more. However, the main challenge is ensuring that it
distinguishes the data in the classes we are interested in in a specific case. Moreover, the fact of having
to know how many classes an unlabeled dataset has is relevant and might bring some limitations. As
an exemplification, in a dataset of unlabeled pictures of cats and dogs, there are only two classes (cats
and dogs), but it is only unknown which image is which. In that case, it is possible to extract features
from the pictures, and we could try to separate them into two clusters using an unsupervised machine
learning algorithm. On the other hand, if we had a dataset of animal pictures, we would need to know
how many different animals there are. Otherwise, we would not only have the problem of knowing if
the algorithm can discriminate the different classes with the features we have chosen, but also we
would not know how many classes or data types there are.
33
3.3. Deep learning approach
Deep learning evolves from machine learning to overcome the fact that the accuracy of most
conventional classification algorithms demands a solid feature engineering to work. Therefore,
requiring previous expert knowledge of the data and a challenging manual process to build descriptive
data features [53]. This fact is depicted in Figure 3.17. It can be seen that, in machine learning, a human
is needed to determine how and which features are going to be extracted as well as how they are going
to be classified. On the other hand, deep learning directly does these steps considerably reducing
human intervention:
Figure 3.17. Machine learning vs. deep learning.
Briefly, deep learning focuses on modeling high-level abstractions of information using computational
architectures that support multiple and iterative nonlinear transformations expressed in matrices or
tensors [54]. Thanks to their potential and scalability, neural networks have become the defining model
of deep learning. The fundamental unit of neural networks is a neuron. Each neuron individually
performs only a simple computation.
Report
34
Figure 3.18. Diagram of a neuron.
For instance, as exemplified in Figure 3.18, in a linear unit (neuron), the input 𝑥 is connected to the
neuron with a weight 𝑤. Everytime a value 𝑥 is driven through a connection it is multiplied by the
weight’s value. For the input 𝑥, what reaches the neuron is 𝑤 · 𝑥. A neural network is able to "learn"
by modifying its weights. Additionally, to allow the neuron to alter the output independently of its
inputs, a remarkable weight is introduced: The bias, represented by the 𝑏. It doesn't have any input
associated. Instead, a 1 is defined in the diagram. Hence, the value reaching the neuron is just 𝑏 (1 ·
𝑏 = 𝑏).
Neural networks are usually organized into layers of neurons. In addition, when they are located
between the input and output layer, they are called hidden layers since their outputs are never seen
directly:
Figure 3.19. Deep neural network [55].
35
However, the output of two or more layers with nothing in between is comparable to a single layer
[56]. Hence, nonlinear operations are needed, which are called activation functions. Essentially, they
are a function applied to each of the layer's outputs. The most common is the rectifier function
𝑚𝑎𝑥(0, input). The rectifier function has a graph with the negative part rectified to zero:
Figure 3.20. The rectifier function.
When this function (Figure 3.20) is applied to a neuron (linear unit), we get a rectified linear unit or
ReLU. Hence, it is common to call the rectifier function the ReLU function [57]. Thus, the neuron’s
output becomes:
Figure 3.21. A rectified linear unit.
Therefore, thanks to the nonlinearity that ReLu allows, neural networks can do complex data
transformations—making possible the regression or classification tasks [57]. For instance, a stack of
neurons forming a fully connected layer (FC) consists of weights, biases, and activation functions. It is
where the classification process begins to take place. These layers typically form the last convolutional
neural networks (CNN) layers before the output layer. An example is shown in Figure 3.22.
Report
36
Figure 3.22. Fully connected neural network.
Besides fully connected layers, there are other types: the convolutional layers and pooling layers. When
these three different types of layers are stacked, a convolutional neural network CNN architecture is
formed. The convolution layer is fundamentally the first layer and is used to extract features from the
input (usually images) by performing a convolution operation [58]. The output is named the feature
map, which feeds the following layers. Generally, a pooling layer follows a convolutional layer. The
principal aim of this layer is to decrease the size of the feature map to reduce computational costs. The
feature map can be reduced in many ways: by taking the maximum values of the feature map (max
pooling), averaging them (mean pooling), or summing (sum pooling) [59].
A neural network can be structured in many ways using different types of neurons and varying the
number of layers. Nevertheless, like machine learning algorithms, it is indispensable to have a training
dataset. Each training data sample consists of inputs with an expected target (the output). Thus,
training a neural network is adjusting the weights to transform the input into the expected output.
However, two more things are needed for a neural network to learn and predict new data: A "loss
function" and an "optimizer" .
During training, the model will use the loss function as a guide for finding the appropriate values of its
weights. The loss function measures the disparity between the target's actual value and the value
predicted. It is essentially supervised learning and will be the one used in this project. Therefore, the
loss function tells the network its objective. The lower the loss, the higher the accuracy.
37
On the other hand, the optimizer is an algorithm that adjusts the weights minimizing the loss. These
algorithms used in deep learning belong to a family called stochastic gradient descent (SGD) [60]. They
are iterative algorithms that train a network in steps repeatedly until the loss is optimal or does not
further decrease. Each iteration done per training data samples is called a minibatch. At the same time,
the number of epochs is how many times the network will see the entire training dataset. The learning
rate is a parameter used for the SGD that determines the step size at each iteration.
In order to assess a model's performance accurately, the model needs to be evaluated on a new set of
data called the validation dataset. For that, the learning curves can be evaluated:
Figure 3.23. Learning curves.
As depicted in the figure above, underfitting the training set is when the loss is not as low as possible
because the model has not learned enough. In contrast, the same phenomenon can happen if the
model is overfitted: learning from noise and giving higher weights to nonrelevant details that have an
undesirable impact on the performance. To overcome the fitting problem, an early stop can be
performed by interrupting the training (Figure 3.23). Actually, what is done is recording the weight
values over epochs and reset them back to where the minimum occurred. Another technique
commonly used to prevent overfitting is the dropout [57]. The idea is to randomly drop out parts of a
layer's input in every training step. Hence, the network ignores the nonrelevant patterns in the training
Report
38
data. Instead, it forces the neural network to search for general patterns, increasing the robustness
and avoiding overfitting.
The use of large training datasets has shown promising capability in many artificial intelligence
applications using deep learning [61] [62] and, more recently, in the biomedical imaging field—
comparable to and, in some cases, surpassing physicians' performance.
39
4. Methodology and implementation
In this section, the set of procedures used to achieve the final objective, which is classification, are
described. All the coding has been developed in Python language, and the ones related to the machine
learning approach can be found in the GITHUB repository [63]. The code regarding the deep learning
approach is not included since it was developed by the Grupo de Investigación de Modelos de
Arendizaje Computacional from the Técnologico de Monterey [64].
4.1. Materials and preprocessing
In the present work, digital mammograms already classified and labeled are used to train and assess
the approaches developed. The database consists of the following image types:
Table 4.1. Dataset composition.
BI-RADS 1 360
BI-RADS 2 335
BI-RADS 3 339
BI-RADS 4 241
Total: 1275
The databases comes from the Hospital Josep Trueta (Girona, Spain) [17] and the Mammographic
Image Analysis Society (MIAS) [65]. A set of experts has manually classified both databases following
the guidelines in the Breast Imaging Reporting and Data System (BI-RADS). Additionally, since the deep
learning approach required a higher number of mammograms, a third database was introduced in only
that approach: the Iberian breast cancer digital repository (BCDR) [66]. Specifically, 20 BI-RADS 2, 20
BI-RADS 3, and 19 BI-RADS 4, were added. It is further explained in section 4.3.2.
After reviewing state of the art, the final objective of this project is to discuss, compare and assess the
implementation of two different approaches for the classifications: one based on machine learning and
the second one based on deep learning.
The mammographies used in this project were already preprocessed [39]. In fact, two independent
steps were performed. The first one was a segmentation of the background and annotations from the
Report
40
whole breast area, while the second one involved separating the pectoral muscle from the rest of the
breast area. The following image is how an initial mammogram looks before the preprocessing:
Figure 4.1. Raw mammogram.
The final result is the mammogram containing only the breast part.
Figure 4.2. Breast profile segmentation of two mammograms using the algorithm of [67].
In this project, the segmented images were also downscaled to a 0.5 factor to reduce the feature
extraction's computational time. It is later discussed in section 4.2.4. The rescaling was performed
using the rescale function provided by the scikit-image open-source image processing library for
Python [68].
41
4.2. Machine learning implementation
As seen in the literature review, the GLCM, LAWS, and LBP are widely used. In this section, the functions
developed in Python and the implementation until the classification will be described.
4.2.1. Extraction of GLCM features
Features extracted from the GLCM have been obtained thanks to the developed function. This
function is called GLCM_extractor.py, and the steps followed for each image are described in the
following diagram:
Figure 4.3. Steps followed by GLCM_extractor.py.
It has to be mention that step 2 (Figure 4.3) is required to maintain the original image size. Step three
is repeated so that a GLCM is computed as many times as pixels the original image had (Figure 4.4).
Report
42
Figure 4.4. Extraction of the statistical features of the first pixel from the GLCM.
Therefore, as a result, 5 matrix arrays are created and stored. The diagram below shows the calculation
of the last GLCM resulting in the ultimate pixel of the descriptors. They can be visualized as images:
Figure 4.5. Last step of the GLCM_extractor.py.
The 5 resulting images can be considered as a single image with 5 texture features for each pixel. From
this idea, a vector that defines each pixel 𝑖 of the original mammogram is obtained:
𝐺𝐿𝐶𝑀[𝑖] = [ 𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡[𝑖] 𝐷𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦[𝑖] 𝐻𝑜𝑚𝑜𝑔𝑒𝑛𝑖𝑡𝑦[𝑖] 𝐸𝑛𝑒𝑟𝑔𝑦[𝑖] 𝐸𝑛𝑒𝑟𝑔𝑦[𝑖]] Eq. 4.1
43
The function GLCM_extractor.py is executed 12 times with the parameters shown in Table 4.2. It has
to be mentioned that GLCMs can be defined in eight offsets (0º, 45º, 90º, 135º, 180º, 225º, 270º, and
315º). Nevertheless, in the original definition, Haralick proposed to use only four directions spaced at
intervals of 45º, as the others are symmetrical [42]. Furthermore, the distance could present a wide
range of values, but the most used are d = 1, 2, 3. In this case, the distance has been chosen according
to the window size.
Table 4.2. Parameters used to extract the features on each image.
Every time the function is executed 5 texture images are obtained per each mammogram. Since
GLCM_extractor.py is run 12 times (Table 4.2), a total of 60 texture images are produced.
Consequently, 60 components describe every pixel.
Parameters
Windows
size (pixels) Angle (rad)
Distance (pixels)
Number of bins
Step (pixels)
1. 5x5 0 3 32 1
2. 5x5 𝜋
4 3 32 1
3. 5x5 𝜋
2 3 32 1
4. 5x5 3𝜋
4 3 32 1
5. 15x15 0 5 32 1
6. 15x15 𝜋
4 5 32 1
7. 15x15 𝜋
2 5 32 1
8. 15x15 3𝜋
4 5 32 1
9. 25x25 0 10 32 1
10. 25x25 𝜋
4 10 32 1
11. 25x25 𝜋
2 10 32 1
12. 25x25 3𝜋
4 10 32 1
Report
44
Initially, the first test was done with the parameters selected in Table 4.2 but using 256 bin levels. This
means that every GLCM computed had a size of 256x256 [69]. However, the computational time taken
for only one execution of the function for one image was around 20 min. As this has to be done 12
times per image, and the number of images is 1275, the total execution time to extract the GLCM
features would have been 212.5 days (20min · 12 𝑡𝑖𝑚𝑒𝑠 · 1275 𝑖𝑚𝑎𝑔𝑒𝑠).
To overcome this concerning issue and to be able to finalize the project in the timeline estimated
initially, different considerations and actions were taken:
• As mentioned in the preprocessing section (section 4.1), the image resolution was reduced to
half. Therefore, the computation time per image is reduced to half as well.
• The step parameter represents the number of pixels the sliding window moves in each
iteration; it is directly related to the number of GLCMs computed. Nevertheless, for instance,
if the step selected is 2, that would mean that the resulting texture images obtained are half
the original size. It is the reason why modifying this parameter to improve the computation
time was rejected.
• The function was optimized so that it does not compute the GLCMs of the background slices.
Since the background represents around 50% in many images, the computation time is
reduced to half.
• Subsequently, the number of bins chosen was 32. The GLCMs of 256 bins (with a size of
256x256) were mainly empty. Even in the most extensive window (25x25 pixels), the repetition
of pairs within the 256 levels is unlikely. Therefore, and following the initial objective of
reducing the computational time, the statistics were extracted from GLCMs with a size 32x32.
As a result of previous adjustments, the function's execution took less than a minute per image.
Therefore, making viable the fact of running the function 12 times per mammogram.
After more than 4 days, the 60 texture images that characterize the mammograms were obtained for
each of them. Due to the reasons explained in section 4.2.4, these features are combined to reduce
their number when building the feature dataset. This process is done using the developed function
df_mother.py. The procedure followed can be found in Figure 4.6.
45
Figure 4.6. GLCM feature reduction (homogeneity).
The figure above points out the main idea of what has been done to reduce the number of features.
Ultimately, the images with the same window size and distance (thus different angles) are averaged.
As pointed out in the literature, this is a fast way to obtain rotationally invariant GLCM features [70]
without losing much information. As a consequence, the final GLCM features are reduced to 15 instead
of the original 60.
Report
46
4.2.2. Extraction of LAWS features
The LAWS features have been extracted using the LAWS_extractor.py. The main steps of this function
are described in the diagram below:
Figure 4.7. Diagram of the feature extraction using LAWS_extractor.py (Part 1).
Essentially, from steps 1 to 3, the image is convoluted with the 15 kernels. Hence, 15 texture images
are obtained since, as mentioned in section 3.1.2, the 𝐿5𝐿5 mask is not used. The original size is
maintained thanks to the padding performed. These 15 images are combined following the
expressions listed in step 4 in Figure 4.7. Eventually, 9 texture images are obtained.
47
The texture image resulting from the convolution with the kernel 𝑅5𝑅5 is displayed as an example:
Figure 4.8. Texture image from 𝑅5𝑅5 and its histogram.
As humans can only distinguish 900 shades of gray and given the low contrast in the image (the vast
majority of pixels have values between -100 and 100), a color map can be applied for visualization
purposes:
𝐼𝑅5𝑅5
Figure 4.9. Use of a colormap to improve visualization of 𝐼𝑅5𝑅5.
Report
48
It can be seen that some parts of the inner breast are highlighted (Figure 4.9) but going back to the
feature extraction process, the last step consists of sliding a 5x5 window pixel by pixel to extract the
local mean, absolute mean, and standard deviation. The value obtained as a result of these three
statistical calculous is stored in three new images. The process is graphically described below using
IR5R5 as an example:
Figure 4.10. Last step of the LAWS_extractor.py.
49
This process is repeated for each of the 9 texture images obtained combining the 15 convolution results
(Step 4 in Figure 4.7). Hence, 27 texture images are obtained for each mammogram. The ones from the
mammogram used as an example in this section are displayed in the figure below:
Figure 4.11. Extraction of the features using LAWS_extractor.py (Part 2).
Report
50
Different window sizes were tested. Actually, LAWS features were extracted for all the dataset using:
5x5, 15x15, and 25x25 window sizes. However, the ones larger than 5x5 were discarded. The main
reason is that bigger windows distorted the images since they were too big compared to the resolution
of the mammograms after the preprocessing. These effects can be seen in the images below:
Figure 4.12. Texture images obtained using a 15x15 window.
Therefore, bigger window sizes were not used as they would have included noise and artifacts in the
features dataset.
Eventually, 27 features that describe each pixel 𝑖 of every mammogram are obtained:
𝐿𝐴𝑊𝑆[𝑖] = [ 𝐿𝐴𝑊𝑆1[𝑖] 𝐿𝐴𝑊𝑆2[𝑖]
𝐿𝐴𝑊𝑆3[𝑖] … 𝐿𝐴𝑊𝑆26[𝑖]
𝐿𝐴𝑊𝑆27[𝑖]] Eq. 4.2
The computation time per image took around 12 seconds with the 5x5 window. Hence, extracting the
features of the whole mammogram dataset needed a bit more than 4 hours. The larger the window,
the more execution time; therefore, this is a limitation and must be considered when choosing the
window size.
51
4.2.3. Extraction of LBP features
Lastly, LBP features were extracted with the LBP_extractor.py. The LBP features can be easily extracted
using the local_binary_pattern module available in the scikit-image open-source image processing
library for Python [51]. Therefore, the steps followed by the function are the followings:
Figure 4.13. Steps followed by GLCM_extractor.py.
The computation of the four methods is already implemented in the module. They represent the
following [51]:
• Default: original local binary pattern, which is grayscale but not rotation invariant.
• Ror: extension of default implementation, which is grayscale and
• rotation invariant.
• Uniform: improved rotation invariance with uniform patterns and finer quantization of the
angular space, which is grayscale and rotation invariant.
• Var: rotation invariant variance measures the contrast of local image texture, which is rotation
but not gray scale-invariant.
Similar to the GLCM process described in section 4.2.1, the amount of texture features is reduced by
combining them, as indicated in the following figure. It is done using the df_mother.py.
Report
52
Figure 4.14. Combination of the LBP features with different parameters.
The selection of the number of neighbors and radius are diverse in the literature. However, the ones
selected for the current project (listed in Figure 4.14) are similar to the ones used in other studies of
mammogram classification, especially in those that use the scale BI-RADS [39] [17] and proved good
accuracy.
It is essential to mention that the texture images obtained from the LBP (Figure 4.14) are not 8-byte
images since the local binary patterns can be as big as the number of neighbors. To display them here,
they were converted to 8 bits. Therefore, some images above seem to have no contrast and that the
average result appears to be inconsistent. For instance, the "default" ones result in an image with a
black background because the colormap is adjusted to the intensity range.
The computation time for the LBP_extractor.py takes less than a second per image, making it the
fastest among the feature extraction functions developed in this project.
Eventually, 4 values that describe each pixel are obtained:
𝐿𝐵𝑃[𝑖] = [ 𝐿𝐵𝑃1[𝑖] 𝐿𝐵𝑃2[𝑖]
𝐿𝐵𝑃3[𝑖] 𝐿𝐵𝑃4[𝑖]
] Eq. 4.3
53
4.2.4. Creating the feature dataset
Thanks to the three functions developed: GLCM_extractor.py, LAWS_extractor.py, and
LBP_extractor.py, 99 texture images were extracted for every mammogram with the different
parameters chosen. Henceforth, each pixel of any mammogram would have 99 values that describe
itself and the local pixel neighbors (texture), which is essential for classification in machine learning.
Specifically, the 99 features came from:
Table 4.3. Breakdown of features extracted.
EXTRACTOR TIMES RUN WITH DIFFERENT
PARAMETERS FEATURES PER RUN
FEATURES
GLCM 12 5 60 LAWS 1 27 27 LBP 3 4 12
Total = 99
After the feature combinations mentioned, the total number of features ends being:
Table 4.4. Breakdown of features extracted after combination.
EXTRACTOR TIMES RUN WITH
DIFFERENT PARAMETERS FEATURES PER RUN
FEATURES
GLCM 12 5 60 15 LAWS 1 27 27 LBP 3 4 12 4
Total = 46
Besides that, the dataset was still heavy. Actions were taken to avoid the further steps becoming time-
consuming for the computation. Averaging in blocks (pixel-binning) the 4 adjacent pixels into a super-
pixel [71] for each texture image was done to overcome the issue stated. The idea is represented in
Figure 4.15.
Report
54
Figure 4.15. Binning process of the texture images.
Consequently, the dataset size was reduced by a factor of 4 without a relevant loss of information since
the super-pixels are essentially an average of 4 neighbor pixels.
Afterward, all mammograms and texture images are stored in the data frame shown below. This data
reorganization is done via the developed function df_mother.py. In the example below, only some rows
have been taken for viewing purposes as it has thousands:
Table 4.5. Main data frame.
… …
(…)
image_name label pixel_number image LBP0 LBP1 LBP2 LBP3 GLCM0 GLCM1 GLCM2 GLCM14 LAWS0 LAWS1 LAWS2 Column1 LAWS25 LAWS26
LOW_A_0357_1.LEFT_MLO_b4.png 4 122 0,20 7437,72 3,59 1,96 1289,77 59,31 0,39 5,92 -991,16 707376,79 991,16 1,6E-14 3,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 123 0,31 2765,65 1,72 1,36 1212,20 55,58 0,45 5,41 -1435,34 686458,71 1435,34 2,0E-14 3,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 124 0,29 9033,28 9032,21 4,00 1568,86 46,37 0,53 4,21 -523,69 1593386,62 1124,85 1,1E-14 2,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 155 0,05 19655,77 3876,17 3,92 806,76 60,43 0,33 4,77 -596,61 377708,89 596,61 3,3E-15 1,2E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 156 0,22 8236,32 3,77 2,14 1015,58 66,04 0,34 6,09 -1157,46 423724,79 1157,46 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 157 0,32 2774,41 1,24 1,12 817,41 63,31 0,39 5,85 -1076,37 529554,77 1076,37 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 158 0,43 164,08 0,62 0,43 1524,39 54,00 0,47 4,74 19,90 1366446,43 1004,14 2,7E-14 3,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 189 0,10 19648,95 3876,96 4,43 826,11 66,02 0,29 4,40 -557,67 229991,28 557,67 2,5E-15 9,8E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 190 0,18 8239,04 6,50 2,71 864,12 74,19 0,29 5,81 -757,24 110297,22 757,24 3,4E-15 1,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 191 0,31 3417,35 2,40 1,51 671,20 72,89 0,34 6,04 -518,36 22802,04 518,36 1,7E-15 1,1E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 192 0,38 1845,68 1,88 1,22 1192,86 63,64 0,41 5,14 76,30 553078,64 627,79 5,7E-14 4,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 193 0,03 20644,71 20644,71 8,78 1914,43 51,54 0,53 4,10 784,98 563471,80 864,14 1,2E-13 8,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 223 0,02 19653,15 3902,86 5,95 1004,26 70,11 0,28 4,02 -354,64 99404,46 354,64 2,3E-15 1,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 224 0,18 15307,21 17,57 5,00 956,25 81,09 0,27 5,42 -570,15 27832,92 570,15 4,5E-15 1,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 225 0,27 9084,50 7,28 3,30 968,15 82,07 0,30 6,15 -473,32 11070,82 473,32 3,1E-15 1,5E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 226 0,36 3486,49 3,64 2,82 1023,74 73,25 0,36 5,49 -155,15 249495,19 453,51 5,5E-14 5,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 227 0,32 6246,52 6133,61 2,64 1841,25 60,11 0,48 4,50 716,69 732987,62 909,97 1,7E-13 1,2E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 258 0,17 19165,89 50,13 7,67 1226,43 85,71 0,25 5,08 -484,64 41337,71 484,64 4,0E-15 1,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 259 0,25 9611,63 95,98 7,98 1160,94 89,96 0,27 6,14 -513,24 14499,90 513,24 2,8E-15 1,3E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 260 0,36 3677,49 53,78 6,12 1112,38 82,24 0,32 5,80 -354,64 69668,76 425,59 1,9E-14 2,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 261 0,41 2061,61 35,10 6,60 1909,59 68,87 0,43 4,94 448,03 744566,17 789,66 1,4E-13 1,0E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 292 0,07 20109,58 132,26 7,67 1298,55 88,83 0,23 4,83 -457,71 102240,29 457,71 1,2E-15 7,5E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 293 0,24 10506,96 101,86 7,61 1243,44 96,21 0,24 6,03 -620,80 27483,55 620,80 1,6E-15 1,0E-17
image_name label pixel_number image LBP0 LBP1 LBP2 LBP3 GLCM0 GLCM1 GLCM2 GLCM14 LAWS0 LAWS1 LAWS2 Column1 LAWS25 LAWS26
LOW_A_0357_1.LEFT_MLO_b4.png 4 122 0,20 7437,72 3,59 1,96 1289,77 59,31 0,39 5,92 -991,16 707376,79 991,16 1,6E-14 3,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 123 0,31 2765,65 1,72 1,36 1212,20 55,58 0,45 5,41 -1435,34 686458,71 1435,34 2,0E-14 3,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 124 0,29 9033,28 9032,21 4,00 1568,86 46,37 0,53 4,21 -523,69 1593386,62 1124,85 1,1E-14 2,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 155 0,05 19655,77 3876,17 3,92 806,76 60,43 0,33 4,77 -596,61 377708,89 596,61 3,3E-15 1,2E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 156 0,22 8236,32 3,77 2,14 1015,58 66,04 0,34 6,09 -1157,46 423724,79 1157,46 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 157 0,32 2774,41 1,24 1,12 817,41 63,31 0,39 5,85 -1076,37 529554,77 1076,37 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 158 0,43 164,08 0,62 0,43 1524,39 54,00 0,47 4,74 19,90 1366446,43 1004,14 2,7E-14 3,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 189 0,10 19648,95 3876,96 4,43 826,11 66,02 0,29 4,40 -557,67 229991,28 557,67 2,5E-15 9,8E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 190 0,18 8239,04 6,50 2,71 864,12 74,19 0,29 5,81 -757,24 110297,22 757,24 3,4E-15 1,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 191 0,31 3417,35 2,40 1,51 671,20 72,89 0,34 6,04 -518,36 22802,04 518,36 1,7E-15 1,1E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 192 0,38 1845,68 1,88 1,22 1192,86 63,64 0,41 5,14 76,30 553078,64 627,79 5,7E-14 4,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 193 0,03 20644,71 20644,71 8,78 1914,43 51,54 0,53 4,10 784,98 563471,80 864,14 1,2E-13 8,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 223 0,02 19653,15 3902,86 5,95 1004,26 70,11 0,28 4,02 -354,64 99404,46 354,64 2,3E-15 1,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 224 0,18 15307,21 17,57 5,00 956,25 81,09 0,27 5,42 -570,15 27832,92 570,15 4,5E-15 1,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 225 0,27 9084,50 7,28 3,30 968,15 82,07 0,30 6,15 -473,32 11070,82 473,32 3,1E-15 1,5E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 226 0,36 3486,49 3,64 2,82 1023,74 73,25 0,36 5,49 -155,15 249495,19 453,51 5,5E-14 5,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 227 0,32 6246,52 6133,61 2,64 1841,25 60,11 0,48 4,50 716,69 732987,62 909,97 1,7E-13 1,2E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 258 0,17 19165,89 50,13 7,67 1226,43 85,71 0,25 5,08 -484,64 41337,71 484,64 4,0E-15 1,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 259 0,25 9611,63 95,98 7,98 1160,94 89,96 0,27 6,14 -513,24 14499,90 513,24 2,8E-15 1,3E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 260 0,36 3677,49 53,78 6,12 1112,38 82,24 0,32 5,80 -354,64 69668,76 425,59 1,9E-14 2,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 261 0,41 2061,61 35,10 6,60 1909,59 68,87 0,43 4,94 448,03 744566,17 789,66 1,4E-13 1,0E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 292 0,07 20109,58 132,26 7,67 1298,55 88,83 0,23 4,83 -457,71 102240,29 457,71 1,2E-15 7,5E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 293 0,24 10506,96 101,86 7,61 1243,44 96,21 0,24 6,03 -620,80 27483,55 620,80 1,6E-15 1,0E-17
image_name label pixel_number image LBP0 LBP1 LBP2 LBP3 GLCM0 GLCM1 GLCM2 GLCM14 LAWS0 LAWS1 LAWS2 Column1 LAWS25 LAWS26
LOW_A_0357_1.LEFT_MLO_b4.png 4 122 0,20 7437,72 3,59 1,96 1289,77 59,31 0,39 5,92 -991,16 707376,79 991,16 1,6E-14 3,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 123 0,31 2765,65 1,72 1,36 1212,20 55,58 0,45 5,41 -1435,34 686458,71 1435,34 2,0E-14 3,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 124 0,29 9033,28 9032,21 4,00 1568,86 46,37 0,53 4,21 -523,69 1593386,62 1124,85 1,1E-14 2,7E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 155 0,05 19655,77 3876,17 3,92 806,76 60,43 0,33 4,77 -596,61 377708,89 596,61 3,3E-15 1,2E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 156 0,22 8236,32 3,77 2,14 1015,58 66,04 0,34 6,09 -1157,46 423724,79 1157,46 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 157 0,32 2774,41 1,24 1,12 817,41 63,31 0,39 5,85 -1076,37 529554,77 1076,37 1,3E-14 2,9E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 158 0,43 164,08 0,62 0,43 1524,39 54,00 0,47 4,74 19,90 1366446,43 1004,14 2,7E-14 3,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 189 0,10 19648,95 3876,96 4,43 826,11 66,02 0,29 4,40 -557,67 229991,28 557,67 2,5E-15 9,8E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 190 0,18 8239,04 6,50 2,71 864,12 74,19 0,29 5,81 -757,24 110297,22 757,24 3,4E-15 1,4E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 191 0,31 3417,35 2,40 1,51 671,20 72,89 0,34 6,04 -518,36 22802,04 518,36 1,7E-15 1,1E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 192 0,38 1845,68 1,88 1,22 1192,86 63,64 0,41 5,14 76,30 553078,64 627,79 5,7E-14 4,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 193 0,03 20644,71 20644,71 8,78 1914,43 51,54 0,53 4,10 784,98 563471,80 864,14 1,2E-13 8,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 223 0,02 19653,15 3902,86 5,95 1004,26 70,11 0,28 4,02 -354,64 99404,46 354,64 2,3E-15 1,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 224 0,18 15307,21 17,57 5,00 956,25 81,09 0,27 5,42 -570,15 27832,92 570,15 4,5E-15 1,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 225 0,27 9084,50 7,28 3,30 968,15 82,07 0,30 6,15 -473,32 11070,82 473,32 3,1E-15 1,5E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 226 0,36 3486,49 3,64 2,82 1023,74 73,25 0,36 5,49 -155,15 249495,19 453,51 5,5E-14 5,0E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 227 0,32 6246,52 6133,61 2,64 1841,25 60,11 0,48 4,50 716,69 732987,62 909,97 1,7E-13 1,2E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 258 0,17 19165,89 50,13 7,67 1226,43 85,71 0,25 5,08 -484,64 41337,71 484,64 4,0E-15 1,6E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 259 0,25 9611,63 95,98 7,98 1160,94 89,96 0,27 6,14 -513,24 14499,90 513,24 2,8E-15 1,3E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 260 0,36 3677,49 53,78 6,12 1112,38 82,24 0,32 5,80 -354,64 69668,76 425,59 1,9E-14 2,8E-17
LOW_A_0357_1.LEFT_MLO_b4.png 4 261 0,41 2061,61 35,10 6,60 1909,59 68,87 0,43 4,94 448,03 744566,17 789,66 1,4E-13 1,0E-16
LOW_A_0357_1.LEFT_MLO_b4.png 4 292 0,07 20109,58 132,26 7,67 1298,55 88,83 0,23 4,83 -457,71 102240,29 457,71 1,2E-15 7,5E-18
LOW_A_0357_1.LEFT_MLO_b4.png 4 293 0,24 10506,96 101,86 7,61 1243,44 96,21 0,24 6,03 -620,80 27483,55 620,80 1,6E-15 1,0E-17
… …
55
The reorganization of the images is done through a flattening process. It consists of converting a 2D
array to 1D:
[
𝑎0 𝑎1 𝑎2
𝑎3 𝑎4 𝑎5
𝑎6 𝑎7 𝑎8
] →
[
[ 𝑎0
𝑎1
𝑎2
𝑎3
𝑎4
𝑎5
𝑎6
𝑎7
𝑎8]
]
Eq. 4.4
Therefore, every mammogram and its texture images are flattened. Hence, resulting in that each row
of the table represents the pixel and its 46 features. It is done for every mammogram, and all of them
are concatenated in one single table conceptually similar to the following figure:
Figure 4.16. Conceptual dataset.
Report
56
The data that contains each column (Table 4.5) is described below:
• First column: mammogram file name.
• Second column: actual label assigned following the BI-RADS scale.
• Third column: mammogram pixel values.
• Fourth and ahead: texture image pixel values
It needs to be highlighted that the background has been removed. This is why the column
"pixel_number" begins with 122 instead of 0. 122 is the first pixel that contains the breast in that
specific example. This data frame is the one that will be used for the following steps.
4.2.5. Dense tissue segmentation
In some studies, the use of ROIs (regions of interest) that are manually selected [29][18] is widely used.
Essentially, it is cropping a section of the dense part of the mammogram (the part that varies among
the different BI-RADS categories) and extracting features to classify the mammogram.
Figure 4.17. Example of a selection of an ROI [29].
However, to avoid manual manipulation of the images and to make the process as automatized as
possible, a method to segment the dense area was implemented. Different unsupervised classification
methods for segmentation were tested (k-means, SVM, thresholding, and region growth).
Nevertheless, the one that showed promising results with a good trade-off between performance and
computational time, besides being easy to implement, was the Fuzzy C-Means algorithm. Actually, it
has already been used in previous projects showing good results [17][72][18]. Some segmentation
outputs are shown in Figure 4.18.
57
Figure 4.18. Segmentation examples through FCM.
The features used for this segmentation were selected by discarding the ones where the dense area
was not clearly contrasted nor easily identifiable from the rest of the breast. The results obtained using
all features were not appropriated:
Figure 4.19. Result of the segmentation test with all the features.
Report
58
The breast was successfully divided into two separate categories (Figure 4.18, number of clusters = 2):
fatty tissue and dense tissue after removing some features that were mainly black, such as the ones
below:
Figure 4.20. Examples of features discarded.
Nevertheless, in some situations, artifacts related to the preprocessing (next to where the pectoral
muscles were) are sometimes present:
Figure 4.21. Segmented artifacts.
The case above occurred in few images, and it is a minor error compared to the dense area. It needs
to be mentioned that this is an unsupervised segmentation method intended to sample relevant parts
(dense area) of the mammogram for the final classification. The Fuzzy C-Means (FCM) algorithm [14]
is an extension of the well-known k-Means algorithm (Figure 3.15). Fuzzy C-Means' main difference is
that each image pattern is associated with every cluster using a fuzzy membership function [73]. In
non-fuzzy clustering (also known as hard clustering), data is divided into distinct clusters, where each
data point can only belong to exactly one group. In fuzzy clustering, data points can potentially belong
to multiple clusters. For example, an apple can be red or green (hard clustering), but an apple can also
59
be red and green (fuzzy clustering) [74]. Hence, FCM is a practical and concise segmentation algorithm
[40] that allows checking the membership percentage of each cluster. This fact was valuable since it is
an unsupervised classification method. Therefore, the percentage of belonging to a cluster was used
as a parameter to check whether the features selected were significantly discriminating the fatty and
dense tissue. The algorithm used is the one provided by the official third-party software repository for
Python [74].
4.2.6. Classification
Right after the segmentation step, the pixels that do not belong to the dense tissue cluster are removed
from the dataset (Table 4.5). Then, the remaining dense pixels can be classified according to the 46
features. The dataset was fragmented into three parts: Training, validation, and test. Each of them with
the following percentages:
Figure 4.22. Data division sizes.
The accuracy using all features was not ideal (<0.5) even though different classifiers with a good trade-
off between computation and performance were tested. Therefore, a feature selection method was
implemented. It consisted of correlating the 46 features of the dataset training with the actual labels
and selecting only the highly correlated. The following heatmap (Figure 4.23) shows the correlation
between the labels and the features.
Report
60
Figure 4.23. Heatmap of the correlation.
This feature selection process is done using the feature_selection.py code. The features that showed
the highest correlation of the training dataset were selected and are the following ones:
Table 4.6. Features with high correlation.
LBP2 0.078577 LBP3 0.058869 GLCM0 0.053648 GLCM2 0.086261 GLCM3 0.081330 GLCM4 0.093957 GLCM5 0.058481 GLCM7 0.109125 GLCM8 0.101990 GLCM9 0.106501 GLCM10 0.050399 GLCM12 0.106888 GLCM13 0.087769 GLCM14 0.097127 LAWS2 0.084407 LAWS5 0.054620 LAWS26 0.051825
61
It has to be taken into account that what is being classified are not the images but pixels. Hence, the
k-NN algorithm, already introduced in section 3.2, is used with the selected features (Table 4.6). The
k-NN was chosen due to its low computational demand and the feasible training and prediction time.
Furthermore, it is also recommended by scikit [75] for the type of dataset of this project. Moreover, it
is one of the most used methods for image classification using texture not only in medical images [19]
[76] but also in other fields that use texture extraction features [77].
All previous steps and in advance are done with classifier.py. The k-NN algorithm was set with a k = 3;
this criterion means that the prediction of new points will be made according to the three closest
neighbors (Figure 3.15). Usually, an odd number is chosen if the number of classes is even [78], which
is the case. Moreover, in studies with similar datasets and texture features, there was no significant
difference in accuracy using k numbers between 1 to 9 [79]. Since higher numbers mean higher
computational demand, a k = 3 was chosen. The predictions obtained with the trained k-NN can be
seen in Table 4.7:
Table 4.7. Classified pixels of each image.
image name actual label dense 1 2 3 4
LOW_D_4576_1.RIGHT_MLO_b1.png 1 811 71,76 18,99 8,01 1,23
LOW_pdb059ls_b1.png 1 5200 63,31 23,15 12,77 0,77
LOW_pdb070rl_b1.png 1 8014 66,47 17,43 14,99 1,11
LOW_pdb267ll_b1.png 1 7455 63,30 21,07 14,77 0,86
LOW_pdb301lm_b1.png 1 6352 66,33 23,76 9,01 0,91
LOW_pdb301lm_b1.png 1 6352 66,33 23,76 9,01 0,91
LOW_pdb305lm_b1.png 1 5656 68,00 21,53 9,95 0,51
LOW_pdb306rm_b1.png 1 5477 61,71 24,23 13,75 0,31
LOW_tdb052mlol_b1.png 1 7653 71,25 19,50 8,55 0,71
LOW_tdb052mlol_b1.png 1 7653 71,25 19,50 8,55 0,71
LOW_tdb087mlor_b1.png 1 8232 67,59 20,23 11,31 0,87
LOW_tdb087mlor_b1.png 1 8232 67,59 20,23 11,31 0,87
LOW_pdb128rm_b2.png 2 4077 48,15 30,88 18,91 2,06
LOW_pdb192rs_b2.png 2 4122 67,25 20,35 11,64 0,75
LOW_pdb202rl_b2.png 2 9450 51,21 24,85 22,77 1,17
LOW_pdb207lm_b2.png 2 3346 59,59 22,30 16,38 1,73
LOW_tdb020mlol_b3.png 3 8095 60,73 15,86 21,89 1,52
LOW_tdb050mlor_b3.png 3 3519 52,57 19,10 26,83 1,51
LOW_tdb050mlor_b3.png 3 3519 52,57 19,10 26,83 1,51
LOW_tdb020mlol_b3.png 3 8095 60,73 15,86 21,89 1,52
LOW_tdb038mlor_b3.png 3 5678 57,68 20,36 20,38 1,59
LOW_A_0254_1.RIGHT_MLO_b4.png 4 239 42,26 22,59 25,52 9,62
LOW_A_0261_1.LEFT_MLO_b4.png 4 653 43,64 17,76 29,10 9,49
LOW_B_3606_1.LEFT_MLO_b4.png 4 1176 47,36 19,13 25,77 7,74
LOW_D_4506_1.RIGHT_MLO_b4.png 4 294 48,98 25,17 18,37 7,48
LOW_pdb002rl_b4.png 4 4136 45,50 19,17 25,75 9,57
LOW_pdb172rl_b4.png 4 4045 29,62 21,11 34,78 14,49
Report
62
In Table 4.7, the first column is the picture's file name, the second is the actual label (in BI-RADS), and
the third is the number of dense pixels. The dense number depends on the size of the breast and
resolution, which are not related to the BI-RADS scale. Columns four, five, six, and seven represent the
percentage of the dense pixel that ended up classified as BI-RADS 1, 2, 3, or 4, respectively.
Intuitively, it might seem that an image should be eventually classified by looking at the column with a
higher percentage. However, if this is done in this case, all will end up as BI-RADS 1. Hence, to perform
the final classification, two different methods were tested. The first one was to classify them by
choosing thresholds that define each class's four percentages. Nevertheless, the data to find the
tendency was insufficient as validation was only 10% of the whole dataset. Therefore, this idea was
discarded after the initial tests were not successful. Moreover, it was a manual and arbitrary process.
On the other hand, optimization.py was developed. Essentially, it uses the scipy optimize module [80]
to find four parameters that multiply each of the four columns that contain the percentages (Table
4.7). The objective of this function is to find the parameters that maximize the accuracy. It was done
using the classified validation data. Fundamentally, this is pondering and giving more importance to
the different columns. The parameters are the followings:
𝑤 ≈ [0.80 2.32 2.32 10.91] Eq. 4.5
Therefore, the final classification can be obtained by multiplying each of the four columns with the four
parameters above. The resulting label ("Result" column in Table 4.8) is the maximum among the
pondered columns. The correct classified mammograms are highlighted in green.
63
Table 4.8. Final classification.
image name actual label dense W1 W2 W3 W4 Result
LOW_D_4576_1.RIGHT_MLO_b1.png 1 811 57,4 44,1 18,6 13,4 1
LOW_pdb059ls_b1.png 1 5200 50,6 51,6 29,6 8,4 2
LOW_pdb070rl_b1.png 1 8014 53,2 38,9 34,8 12,1 1
LOW_pdb267ll_b1.png 1 7455 50,6 47,0 34,3 9,4 1
LOW_pdb301lm_b1.png 1 6352 53,1 53,0 20,9 10,0 1
LOW_pdb301lm_b1.png 1 6352 53,1 53,0 20,9 10,0 1
LOW_pdb305lm_b1.png 1 5656 54,4 48,0 23,1 5,6 1
LOW_pdb306rm_b1.png 1 5477 49,4 54,0 31,9 3,4 2
LOW_tdb052mlol_b1.png 1 7653 57,0 43,5 19,8 7,7 1
LOW_tdb052mlol_b1.png 1 7653 57,0 43,5 19,8 7,7 1
LOW_tdb087mlor_b1.png 1 8232 54,1 45,1 26,2 9,5 1
LOW_tdb087mlor_b1.png 1 8232 54,1 45,1 26,2 9,5 1
LOW_pdb128rm_b2.png 2 4077 38,5 68,9 43,9 22,5 2
LOW_pdb192rs_b2.png 2 4122 53,8 45,4 27,0 8,2 1
LOW_pdb202rl_b2.png 2 9450 41,0 55,4 52,8 12,8 2
LOW_pdb207lm_b2.png 2 3346 47,7 49,7 38,0 18,9 2
LOW_D_4514_1.LEFT_MLO_b3.png 3 565 39,4 36,3 62,4 83,0 4
LOW_tdb020mlol_b3.png 3 8095 48,6 35,4 50,8 16,6 3
LOW_tdb050mlor_b3.png 3 3519 42,1 42,6 62,2 16,4 3
LOW_tdb050mlor_b3.png 3 3519 42,1 42,6 62,2 16,4 3
LOW_tdb020mlol_b3.png 3 8095 48,6 35,4 50,8 16,6 3
LOW_tdb038mlor_b3.png 3 5678 46,1 45,4 47,3 17,3 3
LOW_A_0254_1.RIGHT_MLO_b4.png 4 239 33,8 50,4 59,2 104,9 4
LOW_A_0261_1.LEFT_MLO_b4.png 4 653 34,9 39,6 67,5 103,5 4
LOW_B_3606_1.LEFT_MLO_b4.png 4 1176 37,9 42,7 59,8 84,3 4
LOW_D_4506_1.RIGHT_MLO_b4.png 4 294 39,2 56,1 42,6 81,6 4
LOW_pdb002rl_b4.png 4 4136 36,4 42,8 59,7 104,4 4
LOW_pdb172rl_b4.png 4 4045 23,7 47,1 80,7 157,9 4
Report
64
The diagram below shows the steps followed to obtain the classification:
Figure 4.24. Classification process.
As shown in the diagram above, the predictions of the validation data are used to find the weights'
values. These weights are later used to classify the predictions of the test data. Hence, mammograms
are labeled depending on the maximum value (same idea shown in Table 4.8).
4.3. Deep learning implementation
Different architectures of deep learning models were tested, including Alexnet, Inception, Resnet50,
and VGG-16. In addition, these models were pre-trained with ImageNet to prevent overfitting. The
ImageNet project is an enormous database with more than 14 million images, many of which were
hand-annotated. This project has been extremely helpful in advancing computer vision and deep
learning research. In addition, the data is available for free to researchers for non-commercial use [81].
4.3.1. VGG-16
VGG-16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the
Visual Geometry Group (VGG) at the University of Oxford in 2014 [82], achieving 92.7% top-5 test
accuracy in ImageNet. Besides that, this architecture gave the best results among the four tested. The
preliminary performance obtained in the tests with different architectures can be found in Annex A.
65
The VGG-16 is depicted in the following figure:
Figure 4.25. The architecture of VGG-16 [83].
The architecture of VGG-16 is considerably simple. As represented above, it is composed of 2
contiguous blocks of 2 convolution layers followed by a max-pooling. Subsequently, it has 3 contiguous
blocks of 3 convolution layers followed by another max-pooling. Finally, 3 FC dense layers are found
before the output.
Therefore, features are extracted on every convolution layer forming a feature map when a
mammogram is given. Additionally, every time there is max-pooling, the size of the data is reduced to
half by downsampling the input along its spatial dimensions (height and width), taking the maximum
values. In the end, the fully connected layers are found, and the very last layer of this network has as
many neurons as the number of classes to predict.
Once the model is trained, each input mammogram will trigger and activate the different neurons until
they activate one of the last ones, corresponding to every 4 BI-RADS categories (Table 1.1).
Report
66
4.3.2. Data augmentation
It needs to be mentioned that an extra dataset of mammograms was needed as, typically, in deep
learning applications, big training datasets are required to achieve consistent accuracy. This additional
dataset was only used in the deep learning approach. As already mentioned in section 4.1, the
additional mammograms were obtained from the BCDR repository [84].
Even though more breast images were added to the dataset, it was still insufficient for the correct train
and performance of the model. Therefore, a technique known as data augmentation was implemented
to enlarge the dataset of mammograms. It consists of adding slightly modified copies of already existing
mammograms, increasing the images in the dataset to reduce overfitting [85].
Figure 4.26. Example of data augmentation.
Different transformations were tested for the data augmentation, such as vertical and horizontal
flipping, perspective distortions, rotations, blurs, or paddings. These data augmentation techniques
can be easily implemented in Python [14] using the open-source module transforms from Pytorch [86].
Three tests were assessed containing 6, 10, and 12 transformations each, increasing the training data
set by 6, 10, and 12 times, respectively.
67
The transformation performed on each test are listed in the following table:
Table 4.9. Data augmentation methods for each test.
AUG1
1. Horizontal flip 2. Vertical Flip 3. Padding
4. Random perspective distortion 5. Random affine 6. Random rotation
AUG2
1. Horizontal flip 2. Vertical Flip 3. Padding 4. Random affine
5. Random perspective distortion 6. Random rotation 7. Color jitter 8. Gaussian blur
AUG3
1. Horizontal flip 2. Vertical Flip 3. Padding 4. Random affine 5. Random rotation 6. Color jitter
7. Random perspective distortion 8. Random rotation with expansion 9. Random resized crop 10. Center crop 11. Resize 12. Gaussian blur
4.3.3. Training and Learning curves
The model was evaluated with the three training datasets obtained using different data augmentation
methods (Table 4.9).
The training size was composed of 1200 mammographies (90% of the dataset). Hence, after the data
augmentation step, test 1 had 7200 mammograms, test 2 had 9600, and test 3 had 144000. The
learning curves are depicted in the following three figures (Figure 4.27, Figure 4.28, and Figure 4.29);
the validation was done with the remaining 134 mammograms (10% of the dataset).
Report
68
Figure 4.27. AUG1 Loss vs. Epoch.
Figure 4.28. AUG2 Loss vs. Epoch.
69
Figure 4.29. AUG3 Loss vs. Epoch.
In Figure 4.27, Figure 4.28, and Figure 4.29, the red dotted line marks the point where the loss for the
validation was minimal. It is the point where the early stop was performed (described in section 3.3),
meaning that the error in that instant was minimum when trying to classify the mammograms in the
validation dataset. Overall, the loss is similar between all tests. However, it can be seen that the more
extensive dataset, the minimum loss for the validation data is achieved earlier (in fewer epochs).
4.3.4. Interpreting the model performance
In section 2, it was shown that deep learning has proven unprecedented accuracy in medical image
classification. However, one of the biggest problems humans encounter in deep learning is model
interpretability. In other words, to understand the model as we do in machine learning, where is it
possible to tear apart all steps and comprehend them.
Therefore, in this section, a widely used technique that makes CNN-based models more
understandable and clear is presented. This technique is called Gradient-weighted Class Activation
Mapping (Grad-CAM) [87]. It takes the class-specific gradient (weights) information of the final
convolutional layer to produce a localization map (image) from the input (mammogram) of the most
relevant regions that trigger the neurons to classify as one or another category.
The Grad-CAMs obtained after 25 epochs training the VGG-16 with the training dataset test 3 are
shown in Figure 4.30. The heatmap indicates that the dense areas of the breast are the ones to which
Report
70
the neural network has given more importance. Hence, confirming that the model is suitable since
mammograms are being classified depending on the dense area. Similarly to the process physician
follow when classifying in the BI-RADS categories (section 1.4). Actually, for the BI-RADS 3 and 4, the
entire dense area is highlighted by the Grad-CAM, comparable to the segmentation done in the
machine learning approach (section 4.2.5):
Figure 4.30. Confusion matrix for training dataset (test 1, 25 epochs). Image and Grad-CAM.
71
5. Discussion
In this section, the results obtained from the machine learning and deep learning approach are
compared and discussed. The tool used to assess the results and performance of the two approaches
is the confusion matrix. Each row of the matrix represents the relative occurrences in a real class, while
each column represents the relative occurrences in a predicted class. Therefore, the higher the
diagonal values of the confusion matrix, the more correct predictions [88].
Firstly, the machine learning approach is analyzed. The resulting confusion matrix is the following:
Figure 5.1. Confusion matrix of the ML approach.
By looking at Figure 5.1, it can be stated that the approach cannot efficiently differentiate between BI-
RADS 1 and 2 and 3 and 4. However, the accuracy for the grouped classes is relevant (Figure 5.2).
Report
72
Figure 5.2. Binned confusion matrix of the ML approach.
In fact, the ML model can differentiate between benign and malign mammograms since, by definition,
only BI-RADS 3 and 4 are likely to be cancerous (Table 1.1). In real clinical practice, out of the four BI-
RADS breast density categories, it is relatively easy to discriminate between "entirely fatty" and
"extremely dense" by visual assessment [53]. In those situations, physicians are comfortable and
confident making decisions without needing assistance. However, it is challenging for them to
differentiate between the two categories. Hence, this is an interesting approach that could help
physicians generate a prediction to help improve the determination of a BI-RADS breast density
category. The results indicate that malign breast tissue could be identified in 91% of the cases.
However, 22% of the BI-RADS 1 and 2 are false positive, meaning that (without human validation) the
model would classify almost a quarter of the mammographies wrongly. In any case, having a reduced
amount of false negatives (only 0.9%) is less concerning than having false positives and erroneously
consider patients as healthy.
On the other hand, three different training datasets were utilized for the deep learning approach with
different rates of data augmentation (as explained in section 4.3.2). However, the one with the largest
data augmentation (AUG3) gave the best results. The confusion matrices of the other two can be found
in Annex B.
73
The resulting confusion matrix for the deep learning approach is the following:
Figure 5.3. Confusion matrix of the DL approach (AUG3).
Comparing Figure 5.1 and Figure 5.3, it can be seen that the deep learning approach obtained far better
results. Actually, by looking at the confusions matrix’s diagonal in Figure 5.3, the DL model can
discriminate significantly better between the four BI-RADS categories than the ML approach.
In addition, if the confusion matrix is binned (Figure 5.4), the results indicate that malign breast tissue
could be identified in 96% of the cases and the benign tissue in 91% of the cases, considerably reducing
the number of false positives with respect to the machine learning approach.
Report
74
Figure 5.4. Binned Confusion Matrix of the DL approach (AUG3).
It needs to be mentioned that the deep learning model trained with the other two datasets (AUG1
and AUG2) had less global accuracy (Annex B). However, the one trained with AUG1 showed an
outstanding accuracy of 99% for the grouped categories BI-RADS 3 and 4.
Therefore, even though having a small mammogram dataset, deep learning has shown significantly
better results compared to the machine learning approach developed in this project. Besides that, the
human intervention in the deep learning approach has been minimal. In contrast, every step needed
to be programmed in ML, and its correct performance needed to be assessed as well. Furthermore,
the texture feature extraction process took 10 times longer than the training of the DL approach, not
only with the VGG-16 but with the different deep learning models tested. Fundamentally, this is
because the architecture of a CNN allows the parallelization of the computation. Thus, making simple
calculous in parallel that otherwise would need an extensive amount of time.
In summary, the deep learning approach has shown a promising potential demonstrating best results,
superlative computational performance, and most importantly, requiring less human intervention
making the implementation more straightforward and less error-prone.
75
6. Environmental impact
The environmental impact can be extensively debated by evaluating all the parts that have been
involved directly or indirectly in this project development resulting in many lines of discussion.
However, this whole project is a set of codes that include different steps entirely developed with the
Spyder [89] and Colab [90] environment in the programming language Python 3 [14]. Therefore, there
is no tangible item manufactured or produced. Furthermore, even though the radiological images have
a substantial energy waste associated and the acquisition system has a considerable environmental
impact, the usage is ethically and economically justified (section 1). Having said that, the image
acquisition process is out of the scope of this report. Only the generation of CO2 by the computer's
electricity consumption is studied for the environmental impact.
Basically, as stated in section 1.6, approximately 18 weeks were employed to develop the project.
Considering an average of 35 hours of work per week using a computer, it can be estimated that
electricity was consumed for a total of 630 hours plus the almost 250 hours required for the extraction
of texture features, ML model training, and other tests carried out. This part was done using a Dell XPS
13 9300 with an average consumption of 33 W, according to the computer's specifications [91].
On the other hand, the training and obtention of the results from the DL model required an execution
time of 30 hours in Colab [90] using a GPU NVIDIA Tesla T4 with a consumption of 70 W [92].
Therefore, the total consumption of the computer for the whole project is 31.1 kWh:
𝐸𝑛𝑒𝑟𝑔𝑦 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = (630 + 250) · 33 + 30 · 70 = 31140 𝑊ℎ = 31.1 𝑘𝑊ℎ Eq. 6.1
In Catalonia, it is estimated that each kW produced generates 321 g of CO2 [93]. Hence, assuming that
the Colab [90] machine was located in Catalonia, the development of this project has associated a
carbon footprint of 10 kg of CO2:
𝐶𝑂2 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑 = 31.1 𝑘𝑊 · 321
𝑔 𝐶𝑂2
𝑘𝑊= 10 𝑘𝑔 𝐶𝑂2
Eq. 6.2
77
Conclusions
In this project, the inherent necessity to develop a tool to assist physicians in the BI-RADS classifications
task and, consequently, in breast cancer diagnosis has been explained. Briefly, this is a result of the
current diagnosis and screening method being error-prone. This fact is very relevant to today's clinic
practice since preventing these errors could save many lives. Therefore, evaluating and comparing the
suitability of implementing deep learning or machine learning for mammogram classification is the
herein project's main objective. To achieve it, two independent models (one of DL and one of ML) were
fully implemented. The final models presented were chosen after reviewing the literature, assessing
the different possibilities, and testing them. Eventually, the two presented were the ones that were
more suitable and gave the best preliminary results given the resources available (database, hardware,
and time).
However, even though the two approaches were implemented and are functional, the most relevant
thing to examine is not the results obtained with each approach. Instead, the most thought-provoking
points to analyze here are the differences observed while implementing one or the other.
For the ML approach, the features needed to be extracted manually. In other words, the features to
extract (texture) has to be chosen and the extraction algorithm needs to be programmed and
developed, requiring human effort and knowledge. In addition to that, traditional techniques rely on
many simple operations computed in series (one after the other) since parallelizing them requires an
additional programming effort. Therefore, the feature extraction process in machine learning is time-
consuming.
Moreover, besides extracting the features, it is also fundamental to check whether they are actually
differentiating the categories efficiently. Many of the features can induce noise or useless information,
increasing the computational cost and affecting the accuracy. Hence, another step that assesses the
feature performance must be included, possibly leading to another manual process of feature
selection. Finally, after choosing the classification algorithm, the feature dataset needs to be arranged
interpretably by the classification algorithm, then the model is trained and ready.
On the other hand, the main drawback of deep learning models is that they usually need more data
for efficient training. Nevertheless, this was overcome using data augmentation techniques, a quick
and simple step. Afterward, the mammograms were directly used to train the model that, connected
to a GPU, took less than 30 min to train a model. The short time is due to the fact that the computation
is parallelized, optimizing the time. This allowed the complete assessment and testing of different
architectures and data augmentation methods.
Report
78
In summary, the results were considerably better for the deep learning approach and outstanding
when the categories were grouped. However, the truly significant added value that DL offers is that
the implementation is immensely straightforward. Indeed, it is as simple as inputting the
mammograms to train the model. Additionally, the training and automatic feature extraction were
done in less than 30 min, while in the machine learning approach, the extraction step took more than
200 hours. This is not a direct limitation since once the model is trained, it is ready to be used if it was
ever implemented in an application. However, every time a new mammogram has to be classified, the
texture features would need to be extracted, taking more than 5 minutes. In contrast, the deep
learning approach can classify a new mammogram in less than seconds, making it more commercially
viable.
In conclusion, with the results obtained, the fast and straightforward implementation, and its viability,
the deep learning approach is a promising candidate to be further improved. Therefore, the main
objective of this project is achieved, concluding and justifying that the path to follow in future works is
deep learning.
Nevertheless, in the approach presented in this project, the accuracy in distinguishing between the
four classes was not optimal, therefore, there is still scope to improve. In this case, data augmentation,
dropout, and varying the learning rate to increase the accuracy were attempted. However, the
overfitting was not reduced at all. Having said that, in future works adding more images to the training
could be key. With a larger dataset, not only will the accuracy improve but also deeper architectures
could be implemented and tested (deeper architectures overfit easily if the dataset is not large
enough). Finally, an intuitive graphical user interface (GUI) for the deep learning model could be
developed. Hence, making the implementation and actual use in the clinical practice possible.
79
Budget
In this section, the hypothetical cost that the development of this project would have is described. The
budget is divided into two primary sources of expenses: the cost of the personnel involved and the cost
of the materials used.
Personnel cost
The costs of hiring a junior engineer to carry out the tasks listed in Figure 1.3 corresponding to the
project’s development are given in Table 0.1. It is estimated that the average salary for that position
is 13 € per hour [90]. Considering a workday of 7 hours and 18 weeks, it results in 630 hours of
work. Hence, adding up a cost of 8190 €.
Table 0.1. Cost for the personnel work.
Tasks Working
hours Cost (€)
First meetings with the tutor 5 65 Planification 12 156
Introduction to the topic 18 234 Delve deeper into Python 15 195
Bibliographic research 50 650 Implementation of the extractors 40 520
Feature extraction 80 1040 Store information in the dataset 40 520
Dense segmentation 55 715 Model selection and classification 70 910
Meeting: revision and improvements 5 65 Writing 30 390
First contact with deep learning 14 182 Literature review 10 130 Meeting with ITM 1 13
Implementation of DL to data set 70 910 Results obtention 20 260
Discussion 15 195 Writing 45 585
Final review 35 455
Total 630 h 8190 €
Report
80
Materials cost
The expenses related to the hardware and software licenses used throughout the development of the
project are estimated in the following table:
Table 0.2. Cost for the materials used.
Hardware Cost (€)
Laptop 1,499 External hard drive 70
Software Cost (€) Microsoft office 35
Colab pro 9
Total 1613 €
To summarize, the project's final cost goes up to 8190 + 1613 = 9893€ without considering the
charge of having someone to supervise the project and its actual implementation.
81
Bibliography
[1] H. Sung et al., “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., p. caac.21660, Feb. 2021, doi: 10.3322/caac.21660.
[2] O. Wyman, “El impacto económico y social del cáncer en España,” 2020. [Online]. Available: https://www.aecc.es/sites/default/files/content-file/Informe-Los-costes-cancer.pdf.
[3] T. Winslow, “National Cancer Institute - Breast Cancer Screening (PDQ®)–Patient Version,” 2013. https://www.cancer.gov/types/breast/patient/breast-screening-pdq (accessed May 03, 2021).
[4] E. Morris, S. A. Feig, M. Drexler, and C. Lehman, “Implications of Overdiagnosis: Impact on Screening Mammography Practices,” Population health management, vol. 18, no. Suppl 1. Mary Ann Liebert, Inc., pp. S3–S11, Sep. 01, 2015, doi: 10.1089/pop.2015.29023.mor.
[5] “Definition of metastatic - NCI Dictionary of Cancer Terms - National Cancer Institute.” https://www.cancer.gov/publications/dictionaries/cancer-terms/def/metastatic (accessed May 01, 2021).
[6] “What Is Breast Cancer? | CDC,” September 14, 2020. https://www.cdc.gov/cancer/breast/basic_info/what-is-breast-cancer.htm (accessed Jan. 23, 2021).
[7] M. Broeders et al., “The impact of mammographic screening on breast cancer mortality in Europe: A review of observational studies,” J. Med. Screen., vol. 19, no. SUPPL. 1, pp. 14–25, Sep. 2012, doi: 10.1258/jms.2012.012078.
[8] P. Autier and M. Boniol, “Mammography screening: A major issue in medicine,” Eur. J. Cancer, vol. 90, pp. 34–62, Feb. 2018, doi: 10.1016/j.ejca.2017.11.002.
[9] “What Is a Mammogram? | CDC,” Centers for Disease Control and Prevention. https://www.cdc.gov/cancer/breast/basic_info/mammograms.htm (accessed Apr. 24, 2021).
[10] D. R. DANCE, “PHYSICAL PRINCIPLES OF MAMMOGRAPHY,” in Physics for Medical Imaging Applications, Springer Netherlands, 2007, pp. 355–365.
[11] D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai, “Detecting and classifying lesions in mammograms with Deep Learning OPEN,” doi: 10.1038/s41598-018-22437-z.
[12] F. P. Kestelman et al., “BREAST IMAGING REPORTING AND DATA SYSTEM-BI-RADS®: POSITIVE PREDICTIVE VALUE OF CATEGORIES 3, 4 AND 5. A SYSTEMATIC LITERATURE REVIEW*,” 2007.
[13] K. Pesce, M. B. Orruma, C. Hadad, Y. B. Cano, R. Secco, and A. Cernadas, “BI-RADS terminology for mammography reports: What residents need to know,” Radiographics, vol. 39, no. 2. Radiological Society of North America Inc., pp. 319–320, Mar. 01, 2019, doi: 10.1148/rg.2019180068.
Report
82
[14] “Welcome to Python.org.” https://www.python.org/ (accessed Jun. 08, 2021).
[15] “MATLAB - El lenguaje del cálculo técnico - MATLAB & Simulink.” https://es.mathworks.com/products/matlab.html (accessed Jun. 08, 2021).
[16] E. T. Pereira, S. P. Eleutério, and J. M. Carvalho, “Local Binary Patterns Applied to Breast Cancer Classification in Mammographies,” Rev. Informática Teórica e Apl., vol. 21, no. 2, p. 32, Nov. 2014, doi: 10.22456/2175-2745.46848.
[17] C. Mata, J. Freixenet, X. Lladó, and A. Oliver, “Texture Descriptors applied to Digital Mammography,” 2008.
[18] R. Rabidas, A. Midya, J. Chakraborty, and W. Arif, “A Study of Different Texture Features Based on Local Operator for Benign-malignant Mass Classification,” Procedia Comput. Sci., vol. 93, no. September, pp. 389–395, 2016, doi: 10.1016/j.procs.2016.07.225.
[19] P. Sonar, U. Bhosle, and C. Choudhury, “Mammography classification using modified hybrid SVM-KNN,” in Proceedings of IEEE International Conference on Signal Processing and Communication, ICSPC 2017, Mar. 2018, vol. 2018-January, pp. 305–311, doi: 10.1109/CSPC.2017.8305858.
[20] A. K. Mohanty, S. Beberta, and S. K. Lenka, “Classifying Benign and Malignant Mass using GLCM and GLRLM based Texture Features from Mammogram,” Int. J. Eng. Res. Appl., vol. 1, no. 3, pp. 687–693, 2011.
[21] T. Sadad, A. Munir, T. Saba, and A. Hussain, “Fuzzy C-means and region growing based classification of tumor from mammograms using hybrid texture feature,” J. Comput. Sci., vol. 29, pp. 34–45, 2018, doi: 10.1016/j.jocs.2018.09.015.
[22] S. J. S. Gardezi and I. Faye, “Fusion of completed local binary pattern features with curvelet features for mammogram classification,” Appl. Math. Inf. Sci., vol. 9, no. 6, pp. 3037–3048, 2015, doi: 10.12785/amis/090633.
[23] A. C. Phadke and P. P. Rege, “Fusion of local and global features for classification of abnormality in mammograms,” Sadhana - Academy Proceedings in Engineering Sciences, vol. 41, no. 4. pp. 385–395, 2016, doi: 10.1007/s12046-016-0482-y.
[24] C. Wang, A. R. Brentnall, J. Cuzick, E. F. Harkness, D. G. Evans, and S. Astley, “A novel and fully automated mammographic texture analysis for risk prediction: Results from two case-control studies,” Breast Cancer Res., vol. 19, no. 1, pp. 1–13, Oct. 2017, doi: 10.1186/s13058-017-0906-6.
[25] A. Manduca et al., “Texture features from mammographic images and risk of breast cancer,” Cancer Epidemiol. Biomarkers Prev., vol. 18, no. 3, pp. 837–845, Mar. 2009, doi: 10.1158/1055-9965.EPI-08-0631.
[26] R. Nithya and B. Santhi, “Application of texture analysis method for mammogram density classification,” J. Instrum., vol. 12, no. 07, pp. P07009--P07009, Jul. 2017, doi: 10.1088/1748-0221/12/07/p07009.
83
[27] Kriti and J. Virmani, “Breast density classification using Laws’ mask texture features,” Int. J. Biomed. Eng. Technol., vol. 19, no. 3, 2015, doi: 10.1504/IJBET.2015.072999.
[28] A. H. Farhan and M. Y. Kamil, “Texture Analysis of Breast Cancer via LBP, HOG, and GLCM techniques,” IOP Conf. Ser. Mater. Sci. Eng., vol. 928, no. 7, 2020, doi: 10.1088/1757-899X/928/7/072098.
[29] A. S. Setiawan, Elysia, J. Wesley, and Y. Purnama, “Mammogram Classification using Law’s Texture Energy Measure and Neural Networks,” Procedia Comput. Sci., vol. 59, no. Iccsci, pp. 92–97, 2015, doi: 10.1016/j.procs.2015.07.341.
[30] A. Gastounioti, A. Oustimov, M. K. Hsieh, L. Pantalone, E. F. Conant, and D. Kontos, “Using Convolutional Neural Networks for Enhanced Capture of Breast Parenchymal Complexity Patterns Associated with Breast Cancer Risk,” Acad. Radiol., vol. 25, no. 8, pp. 977–984, Aug. 2018, doi: 10.1016/j.acra.2017.12.025.
[31] M. M. Jadoon, Q. Zhang, I. U. Haq, S. Butt, and A. Jadoon, “Three-Class Mammogram Classification Based on Descriptive CNN Features,” 2017, doi: 10.1155/2017/3640901.
[32] R. Arora, P. K. Rai, and B. Raman, “Deep feature–based automatic classification of mammograms,” Med. Biol. Eng. Comput., vol. 58, no. 6, pp. 1199–1211, Jun. 2020, doi: 10.1007/s11517-020-02150-8.
[33] G. Altan, “Deep learning-based mammogram classification for breast cancer,” Int. J. Intell. Syst. Appl. Eng., vol. 8, no. 4, pp. 171–176, Dec. 2020, doi: 10.18201/ijisae.2020466308.
[34] Y. J. Suh, J. Jung, and B. J. Cho, “Automated breast cancer detection in digital mammograms of various densities via deep learning,” J. Pers. Med., vol. 10, no. 4, pp. 1–11, 2020, doi: 10.3390/jpm10040211.
[35] L. Shen, L. R. Margolies, J. H. Rothstein, E. Fluder, R. McBride, and W. Sieh, “Deep Learning to Improve Breast Cancer Detection on Screening Mammography,” Sci. Rep., vol. 9, no. 1, pp. 1–12, Dec. 2019, doi: 10.1038/s41598-019-48995-4.
[36] A. A. Mohamed, W. A. Berg, H. Peng, Y. Luo, R. C. Jankowitz, and S. Wu, “A deep learning method for classifying mammographic breast density categories,” Med. Phys., vol. 45, no. 1, pp. 314–321, Jan. 2018, doi: 10.1002/mp.12683.
[37] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, “Deep Learning for Identifying Metastatic Breast Cancer,” Jun. 2016, Accessed: Jun. 08, 2021. [Online]. Available: http://arxiv.org/abs/1606.05718.
[38] A. P. Adedigba, S. A. Adeshinat, and A. M. Aibinu, “Deep learning-based mammogram classification using small dataset,” in 2019 15th International Conference on Electronics, Computer and Computation, ICECCO 2019, Dec. 2019, doi: 10.1109/ICECCO48375.2019.9043186.
[39] C. M. Miquel, S. J. Freixenet, and X. Llad, “MSc . Thesis VIBOT Texture Descriptors applied to Digital Mammography,” 2009.
Report
84
[40] B. A. Jenkins and E. A. Lumpkin, “Developing a sense of touch,” Dev., vol. 144, no. 22, pp. 4048–4090, Nov. 2017, doi: 10.1242/dev.120402.
[41] L. Armi and S. Fekri-Ershad, “Texture image analysis and texture classification methods - A review,” arXiv, vol. 2, no. 1, pp. 1–29, 2019.
[42] R. M. Haralick, I. Dinstein, and K. Shanmugam, “Textural Features for Image Classification,” IEEE Trans. Syst. Man Cybern., vol. SMC-3, no. 6, pp. 610–621, 1973, doi: 10.1109/TSMC.1973.4309314.
[43] “Co-occurrence matrix - Wikipedia.” https://en.wikipedia.org/wiki/Co-occurrence_matrix (accessed May 16, 2020).
[44] S. Van Der Walt et al., “Scikit-image: Image processing in python,” PeerJ, vol. 2014, no. 1, 2014, doi: 10.7717/peerj.453.
[45] Apple, “Blurring an Image | Apple Developer Documentation.” https://developer.apple.com/documentation/accelerate/blurring_an_image (accessed May 31, 2021).
[46] T. Kimpe and T. Tuytschaever, “Increasing the number of gray shades in medical display systems - How much is enough?,” J. Digit. Imaging, vol. 20, no. 4, pp. 422–432, Dec. 2007, doi: 10.1007/s10278-006-1052-3.
[47] S. H. Kim, J. H. Lee, B. Ko, and J. Y. Nam, “X-ray image classification using Random Forests with Local Binary Patterns,” in 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010, 2010, vol. 6, pp. 3190–3194, doi: 10.1109/ICMLC.2010.5580711.
[48] Z. Jun, H. Jizhao, T. Zhenglan, and W. Feng, “Face detection based on LBP,” in ICEMI 2017 - Proceedings of IEEE 13th International Conference on Electronic Measurement and Instruments, Jul. 2017, vol. 2018-January, pp. 421–425, doi: 10.1109/ICEMI.2017.8265841.
[49] L. Armi and S. Fekri-Ershad, “Texture image analysis and texture classification methods - A review,” no. April, 2019, [Online]. Available: http://arxiv.org/abs/1904.06554.
[50] “Local binary patterns - Wikipedia.” https://en.wikipedia.org/wiki/Local_binary_patterns (accessed May 24, 2021).
[51] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Gray scale and rotation invariant texture classification with local binary patterns,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2000, vol. 1842, pp. 404–420, doi: 10.1007/3-540-45054-8_27.
[52] P. Ongsulee, “Artificial intelligence, machine learning and deep learning,” in International Conference on ICT and Knowledge Engineering, Jan. 2018, pp. 1–6, doi: 10.1109/ICTKE.2017.8259629.
[53] A. A. Mohamed, W. A. Berg, H. Peng, Y. Luo, R. C. Jankowitz, and S. Wu, “A deep learning method for classifying mammographic breast density categories,” Med. Phys., vol. 45, no. 1, pp. 314–321, Jan. 2018, doi: 10.1002/mp.12683.
85
[54] Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, Jun. 2012, Accessed: Jun. 11, 2021. [Online]. Available: http://arxiv.org/abs/1206.5538.
[55] IBM Cloud Education, “What are Neural Networks? | IBM,” IBM, 2020. https://www.ibm.com/cloud/learn/neural-networks (accessed Jun. 12, 2021).
[56] “Learn Intro to Deep Learning Tutorials | Kaggle.” https://www.kaggle.com/learn/intro-to-deep-learning (accessed Jun. 13, 2021).
[57] S. Hahn and H. Choi, “Understanding dropout as an optimization trick,” Neurocomputing, vol. 398, pp. 64–70, Jul. 2020, doi: 10.1016/j.neucom.2020.02.067.
[58] J. Ren, M. Green, and X. Huang, “From traditional to deep learning: Fault diagnosis for autonomous vehicles,” in Learning Control, Elsevier, 2021, pp. 205–219.
[59] “Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network | upGrad blog.” https://www.upgrad.com/blog/basic-cnn-architecture/ (accessed Jun. 13, 2021).
[60] “1.5. Stochastic Gradient Descent — scikit-learn 0.24.2 documentation.” https://scikit-learn.org/stable/modules/sgd.html (accessed Jun. 12, 2021).
[61] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553. Nature Publishing Group, pp. 436–444, May 27, 2015, doi: 10.1038/nature14539.
[62] L. Deng and D. Yu, “Deep learning: Methods and applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3–4. Now Publishers Inc, pp. 197–387, Jun. 30, 2013, doi: 10.1561/2000000039.
[63] “Ignaciomoragues/TFE — Repository.” https://github.com/Ignaciomoragues/TFE (accessed Jun. 16, 2021).
[64] “EIC Faculty | Tecnológico de Monterrey en Guadalajara.” https://gda.itesm.mx/faculty/en/professors/gilberto-ochoa-ruiz (accessed Jun. 13, 2021).
[65] “Mammographic Image Analysis Homepage - Databases.” https://www.mammoimage.org/databases/ (accessed Jun. 08, 2021).
[66] “Breast Cancer Digital Repository.” https://bcdr.eu/ (accessed Jun. 08, 2021).
[67] A. Oliver, “Automatic mass segmentation in mammographic images, PhD Thesis,” University of Girona, 2008.
[68] “Module: transform — skimage v0.19.0.dev0 docs.” https://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rescale (accessed May 26, 2021).
[69] “Module: feature — skimage v0.19.0.dev0 docs.” https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.graycomatrix (accessed May 26, 2021).
Report
86
[70] L. Putzu and C. Di Ruberto, “Rotation invariant co-occurrence matrix features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10484 LNCS, pp. 391–401, doi: 10.1007/978-3-319-68560-1_35.
[71] Z. W. Pan, H. L. Shen, C. Li, S. J. Chen, and J. H. Xin, “Fast Multispectral Imaging by Spatial Pixel-Binning and Spectral Unmixing,” IEEE Trans. Image Process., vol. 25, no. 8, pp. 3612–3625, Aug. 2016, doi: 10.1109/TIP.2016.2576401.
[72] A. Torrent et al., “Breast Density Segmentation: A Comparison of Clustering and Region Based Techniques,” in Digital Mammography, 2008, pp. 9–16.
[73] A. Torrent et al., “Breast density segmentation: A comparison of clustering and region based techniques,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, vol. 5116 LNCS, pp. 9–16, doi: 10.1007/978-3-540-70538-3_2.
[74] M. Dias, A. Florêncio, and dirk, “omadson/fuzzy-c-means: v1.4.0,” May 2021, doi: 10.5281/ZENODO.4747689.
[75] “Fuzzy clustering - Wikipedia.” https://en.wikipedia.org/wiki/Fuzzy_clustering (accessed Jun. 04, 2021).
[76] V. Wasule and P. Sonar, “Classification of brain MRI using SVM and KNN classifier,” in Proceedings of 2017 3rd IEEE International Conference on Sensing, Signal Processing and Security, ICSSS 2017, Oct. 2017, pp. 218–223, doi: 10.1109/SSPS.2017.8071594.
[77] S. Manjunath, “Texture Features and KNN in Classification of Flower Images D S Guru,” no. November 2014, 2010.
[78] “KNN Classification using Scikit-learn - DataCamp.” https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn (accessed Jun. 05, 2021).
[79] A. C. Nusantara, E. Purwanti, and S. Soelistiono, “Classification of digital mammogram based on nearest-neighbor method for breast cancer detection,” Int. J. Technol., vol. 7, no. 1, pp. 71–77, 2016, doi: 10.14716/ijtech.v7i1.1393.
[80] “scipy.optimize.minimize — SciPy v1.6.3 Reference Guide.” https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html (accessed Jun. 06, 2021).
[81] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” 2012. Accessed: Jun. 13, 2021. [Online]. Available: http://code.google.com/p/cuda-convnet/.
[82] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2015, Accessed: Jun. 13, 2021. [Online]. Available: http://www.robots.ox.ac.uk/.
87
[83] “The Architecture and Implementation of VGG-16 – Towards AI — The Best of Tech, Science, and Engineering.” https://towardsai.net/p/machine-learning/the-architecture-and-implementation-of-vgg-16 (accessed Jun. 13, 2021).
[84] “Breast Cancer Digital Repository.” https://bcdr.eu/information/about (accessed Jun. 13, 2021).
[85] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, pp. 1–48, Dec. 2019, doi: 10.1186/s40537-019-0197-0.
[86] “torchvision.transforms — Torchvision master documentation.” https://pytorch.org/vision/stable/transforms.html# (accessed Jun. 13, 2021).
[87] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, Oct. 2016, doi: 10.1007/s11263-019-01228-7.
[88] “Confusion matrix — scikit-learn 0.24.2 documentation.” https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html (accessed Jun. 14, 2021).
[89] “Home — Spyder IDE.” https://www.spyder-ide.org/ (accessed Jun. 10, 2021).
[90] “Sueldo: Ingeniero Junior | Glassdoor.” https://www.glassdoor.es/Sueldos/ingeniero-junior-sueldo-SRCH_KO0,16.htm (accessed Jun. 10, 2021).
[91] “Dell XPS 13 9300 Laptop Review.” https://www.notebookcheck.net/Dell-XPS-13-9300-4K-UHD-Laptop-Review-16-10-is-the-New-16-9.464337.0.html (accessed Jun. 10, 2021).
[92] “GPU NVIDIA Tesla T4 con núcleos Tensor para inferencias de IA | NVIDIA Data Center.” https://www.nvidia.com/es-es/data-center/tesla-t4/ (accessed Jun. 11, 2021).
[93] “Guia pràctica per al càlcul d’emissions de gasos amb efecte d’hivernacle (GEH) 0 GUIA PRÀCTICA PER AL CÀLCUL D’EMISSIONS DE GASOS AMB EFECTE D’HIVERNACLE (GEH).”
89
Annex A
In this annex, the preliminary results obtained with VGG-16 and the other CNN architectures tested
are attached. The best result of each one is highlighted in yellow.
AlexNet
No-DropOut
Dropout just On FC layers
0.2, 0.2, 0.2
0.5, 0.5, 0.0
0.5, 0.5, 0.2
0.5, 0.5, 0.5
0.8, 0.8, 0.5
0.8, 0.8, 0.8
Weigthed Avg Precision 0,72 0,62 0,70 0,66 0,70 0,62 Weigthed Avg Recall 0,71 0,62 0,64 0,65 0,64 0,61 Weigthed Avg F1-Score 0,71 0,61 0,63 0,64 0,63 0,59
Macro Avg ROC 0,80 0,74 0,75 0,76 0,75 0,74
Vgg16
No-DropOut
Dropout just On FC layers Original
Dropout on FC layers and:
0.2, 0.2, 0.2
0.0, 0.5, 0.5
0.5, 0.5, 0.2
0.5, 0.5, 0.5
0.8, 0.8, 0.2
0.8, 0.8, 0.8
25 % on
Conv
50% on
Conv
Weigthed Avg Precision 0,79 0,72 0,71 0,70 0,66
Weigthed Avg Recall 0,74 0,69 0,65 0,69 0,64
Weigthed Avg F1-Score 0,73 0,68 0,64 0,69 0,61
Macro Avg ROC 0,81 0,78 0,75 0,79 0,74
Inception
No-DropOut
Dropout just On FC layers
0.2, 0.2, 0.2
0.0, 0.5, 0.0
0.5, 0.5, 0.2
0.5, 0.5, 0.5
0.8, 0.8, 0.2
0.8, 0.8, 0.8
Weigthed Avg Precision 0,65 0,66 0,64 0,61 0,69 Weigthed Avg Recall 0,61 0,63 0,63 0,60 0,67 Weigthed Avg F1-Score 0,58 0,62 0,63 0,58 0,67
Macro Avg ROC 0,73 0,75 0,75 0,73 0,77
Annexes
90
ResNet50
No-DropOut
Dropout just On FC layers
0.2, 0.2, 0.2
0.0, 0.5, 0.5
0.5, 0.5, 0.2
0.5, 0.5, 0.5
0.8, 0.8, 0.5
0.8, 0.8, 0.8
Weigthed Avg Precision 0,73 0,64 0,62 0,78 0,66
Weigthed Avg Recall 0,63 0,60 0,62 0,73 0,61
Weigthed Avg F1-Score 0,59 0,58 0,60 0,72 0,60
Macro Avg ROC 0,73 0,72 0,74 0,80 0,75
DenseNet121
No-DropOut
Dropout just On FC layers
0.2, 0.2, 0.2
0.5, 0.5, 0.0
0.5, 0.5, 0.2
0.5, 0.5, 0.5
0.8, 0.8, 0.5
0.8, 0.8, 0.8
Weigthed Avg Precision 0,69 0,70 0,67 0,71 0,59
Weigthed Avg Recall 0,60 0,67 0,62 0,71 0,59
Weigthed Avg F1-Score 0,58 0,67 0,62 0,70 0,57
Macro Avg ROC 0,72 0,77 0,74 0,81 0,71
91
Annex B
The global confusion matrices and the binned ones of the deep learning approach trained with AUG1
and AUG2 are attached in this annex.
Figure 0.1. Confusion matrix of the DL approach (AUG1).
Figure 0.2. Binned Confusion matrix of the DL approach (AUG1).
Annexes
92
Figure 0.3. Confusion matrix of the DL approach (AUG2).
Figure 0.4. Binned Confusion matrix of the DL approach (AUG2).
Recommended