Upload
others
View
20
Download
0
Embed Size (px)
Citation preview
Automatic Road Pavement Crack Detection using SVM
Afonso Guerlixa Carvalhido Salvador Marques
Dissertation to obtain a Master Degree in
Electrical and Computer Engineering
Jury
President: Prof. Fernando Duarte Nunes
Superviser: Prof. Paulo Luis Serras Lobato Correia
Vogal: Prof. João Pedro Afonso Oliveira da Silva
October of 2012
Acknowledgment
I would like to thank my supervisor for all of his support that was fundamental for the
development and quality of this dissertation. I also would like to thank professor Henrique
Oliveira for all of his assistance in many situations occurred during the dissertation. I thank my
father and mother for giving me the opportunity to complete the master course in electrical
engineering and also my friends as well as my colleagues for always giving me the strength and
courage to proceed.
Abstract
To keep a high road surface quality and road safety, an appropriate maintenance policy needs
to be enforced, as soon as cracks start to appear. Since the traditional way of visually detecting
road cracks by a skilled technician is very time consuming, this dissertation presents an
automatic solution, therefore increasing the speed and efficiency of road surface pavement
analysis and reducing the technician effort and subjectivity of the achieved results. The
proposed system starts by pre-processing the database images, smoothing their texture and
enhancing any existing cracks, being followed by the extraction of descriptive features. Here
each image is divided into several non-overlapping blocks and each block originates a feature
vector. A supervised learning algorithm called support vector machine (SVM), is then used to
detect cracks. For this purpose the software LIBSVM, was used to train and test the system.
After labeling each testing set block as crack or non-crack, a post-processing technique is
applied to remove isolated crack blocks.
Two different databases are used for testing purposes, one that contains easier crack examples
and another with more challenging crack images, both databases being acquired with the
camera optical axis orthogonal to the pavement. Several experiments were made for each
database. In each experiment the classifier output is compared with the respective ground-truth,
segmented by an expert. The results achieved show high recall values for both databases and
also a high precision value for the first database, being capable of competing with the best
results reported in the literature.
Keywords: crack detection; support vector machine; features; image processing.
Resumo
Assim que as fendas aparecem, é necessário aplicar uma manutenção apropriada para manter
a qualidade e a segurança das estradas alta. Uma solução automática é proposta nesta
dissertação, pois o modo tradicional de detectar fendas por um técnico é muito demorado,
aumentando assim a rapidez e a eficiência da análise do pavimento rodoviário e reduzindo o
esforço feito pelo técnico tal como a subjectividade dos resultados. O sistema proposto começa
por pré-processar as imagens, suavizando a sua textura e enaltecendo as fendas existentes,
sendo seguido da extração de características descritivas. Nesta dissertação cada imagem é
dividida em blocos e cada bloco origina um vector de características. Depois, um algoritmo de
aprendizagem automática chamado support vector machine (SVM), é usado para detectar
potenciais fendas, através da biblioteca LIBSVM. Após classificar os blocos do conjunto de
teste como contendo ou não fendas, é usado um pós-procesamento para remover todos os
blocos que contêm fendas mas que se encontram isolados .
Nesta dissertação usaram-se duas bases de dados diferentes, uma delas contendo exemplos
de fendas mais fáceis e outra mais complicados, sendo ambas adquiridas de tal forma, que o
eixo óptico da câmara fica ortogonal ao pavimento. Várias experiências foram feitas para cada
base de dados, sendo que para cada experiência o resultado do sistema é comparado com o
respectivo ground-truth, fornecido pelo perito. Os resultados atingidos mostram um recall alto
para as duas base de dados e um precision alto para a primeira, estando ao nível dos melhores
resultados da literatura.
Palavras Chave: detecção de fendas; support vector machine; características; processamento
de imagem.
Contents 1 Introduction ........................................................................................................................... 1
1.1 Motivation ...................................................................................................................... 1
1.2 Purpose ......................................................................................................................... 3
1.3 Contributions ................................................................................................................. 5
1.4 Structure ........................................................................................................................ 5
2 State of the art ...................................................................................................................... 7
2.1 Introduction .................................................................................................................... 7
2.2 Pre-processing .............................................................................................................. 8
2.3 Feature extraction........................................................................................................ 11
2.3.1 Pixel-based .......................................................................................................... 12
2.3.2 Block-based ......................................................................................................... 13
2.4 Crack detection ............................................................................................................ 13
2.4.1 Pixel-based methods ........................................................................................... 14
2.4.2 Block-based methods .......................................................................................... 15
2.5 Crack classification ...................................................................................................... 21
3 Proposed System ............................................................................................................... 25
3.1 System architecture ..................................................................................................... 25
3.2 Databases ................................................................................................................... 25
3.3 Pre-processing ............................................................................................................ 27
3.4 Feature extraction........................................................................................................ 30
3.4.1 Features selected for road crack detection ......................................................... 31
3.4.2 Statistical properties of the selected features ..................................................... 32
3.5 Classifier ...................................................................................................................... 36
3.6 Post-processing ........................................................................................................... 38
4 System evaluation .............................................................................................................. 43
4.1 Test conditions - Road 1 ............................................................................................. 43
4.2 Test conditions - Road2 .............................................................................................. 43
4.3 Performance measures ............................................................................................... 45
4.4 Experimental results .................................................................................................... 46
5 Conclusions and Future Work ............................................................................................ 53
6 References ......................................................................................................................... 55
List of figures
Figure 1 – Example of ten different types of pavement surface made of asphalt and concrete
materials ........................................................................................................................................ 1
Figure 2 – Four images representing four different types of crack: (i) longitudinal crack (ii):
transversal crack, (iii) miscellaneous crack and (iv) alligator crack. ............................................. 2
Figure 3 – Example of an input image (top) and the respective ideal block based output
expected by the system (bottom) .................................................................................................. 4
Figure 4 – LRIS system ................................................................................................................ 7
Figure 5 – Complete architecture of an automatic crack detection system.................................. 8
Figure 6 – Example of an input image (top) and the pre-processed image with histogram
equalization (bottom) [13] ............................................................................................................ 10
Figure 7 – Thresholding example. The original image is shown on the left and the thresholding
output is represented on the right................................................................................................ 14
Figure 8 – Example of a block crack type. ................................................................................. 17
Figure 9 – SVM feature space example that selects the support vectors to separate the two
pattern classes through a hyperplane taken from [38]. .............................................................. 19
Figure 10 – 2D standard deviation for crack classification with an example of a longitudinal
crack represented by the point L1 taken from [6]. ....................................................................... 22
Figure 11 – Block diagram of the proposed crack detection system. ......................................... 25
Figure 12 – Example of two images of the two different databases considered ........................ 26
Figure 13 – Presentation of the several pre-processing configuration applied to the image
database, namely top-hat, mean filter followed by top-hat and min-filter followed by adaptive
histogram equalization (AHE). ..................................................................................................... 28
Figure 14 – Top- hat filter representation. All the image pixel intensities above api will have a
grey level equal to api ................................................................................................................. 29
Figure 15 – Mean histogram of the cracks (left) and non-cracks (right) of Road1. The vertical
axis represents the mean value and the horizontal axis the pixel value ..................................... 29
Figure 16 – Example of the three pre-processing combinations. Top-left is the original image,
top-right the top-hat image, bottom-left mean filter followed by top-hat and bottom-right min-filter
followed by adaptive histogram equalization ............................................................................... 30
Figure 17 – Histograms of the six features of crack (red) and non-crack blocks (green): a)
minimum value; b) mean intensity; c) variance; central moments of order d) 3, e) 4 and f) 5. The
vertical axis corresponds to the probability of each pattern to occur. The horizontal axis
corresponds to the feature value. Note that these histograms are normalized .......................... 33
Figure 18 – Scatter diagrams of six feature pairs for the first database a) mip and third order, b)
mip and fourth order, c) mip and fifth order, d) third and fourth order, e) third and fifth order, f)
fourth and fifth order. The green corresponds to non-crack blocks while the red represents the
crack blocks. The horizontal axis corresponds to one of the features used (the first in each
paragraph) and the vertical axis corresponds the other (the second). ........................................ 35
Figure 19 – Example that shows the improvement of the post-processing technique. The top
image corresponds to the ground-truth, the middle image to the classifier output and the bottom
image to the classifier output after post-processing. ................................................................... 39
Figure 20 – Example that eliminates crack blocks using a post-processing technique, leading to
worse results. The top image corresponds to the ground-truth, the middle image to the classifier
output and the bottom image to the classifier output after post-processing. .............................. 40
Figure 21 – Example of a doubtful case where the classifier detect blocks that have crack
evidence but not classified as such by the ground-truth. The top image corresponds to the
ground-truth, the middle image to the classifier output and the bottom image to the classifier
output after post-processing ........................................................................................................ 41
Figure 22 – Non-crack images examples that were rightly classified after pos-processing. The
right column images are the ground-truth (in this case it coincides with the original image) and
the left column images the classifier output. After post-processing the block classified as crack
in each image is eliminated. ........................................................................................................ 42
Figure 23 – Performance for different training set sizes (training images are randomly selected).
The horizontal axis represents the number of images of the training set. The vertical axis
represents the evaluation metric (recall for the first graph and f-measure for the second graph).
..................................................................................................................................................... 44
List of tables
Table 1 – Mutual information and correlation coefficient of the proposed features for the first
database ...................................................................................................................................... 34
Table 2 – Precision (top), recall (middle) and f-measure (bottom) of Road1 ............................. 46
Table 3 – Precision (top), recall (middle) and f-measure (bottom) of Road1 ............................. 47
Table 4 – Results of isolated features with and without pre-processing for the first database. . 50
Table 5 – Results of isolated features with pre-processing for the first database ...................... 51
Table 6 – Best joint recall (top) and best joint f-measure (bottom) achieved for the first
database. ..................................................................................................................................... 51
Table 7 – Comparison of the literature results with the results achieved in the developed system
..................................................................................................................................................... 54
List of abbreviations
BPNN – Back propagation neural network
GaMM – Gauss Markovian modeling
JAE – Junta autónoma das estradas
KNN – K-nearest neighbor
NN – Neural network
NSCT Non subsample contourlet transform
OAA – One against all
OAO – One against one
PDE – Partial differential equation
RBF – Radial basis function
SVM – Support vector machine
1
1 Introduction
1.1 Motivation
Roads have an important role in modern societies allowing a comfortable, fast and cheap way
to travel from one place to another. Roads can connect several different places not only inside
the cities, shortening their distance, but also between cities and villages as well as between
different countries, easing the mobility of people. They also have a strong impact on the
economic growth, due to the fact that roads promote tourism and supply a quick way for the
distribution and trading of goods. As a result, many vehicles use them every day, causing a
continuous degradation of the road pavement surface. If an appropriate maintenance policy is
not applied, the quality of the road pavement surface degrades, compromising road security.
Road pavement surfaces are often composed by asphalt, although there may be other types,
notably based on concrete materials [1]. Several distinct types of pavement can be identified,
considering their texture composition. Figure 1 (taken from [1]) illustrates 10 different types of
pavement, 7 of which are composed by asphalt and 3 by concrete materials.
Figure 1 – Example of ten different types of pavement surface made of asphalt and concrete
materials.
As observed in Figure 1, pavement surface images present different texture characteristics. For
instance, the granulation size and grey level can vary drastically from one pavement type to
another. Also the pavement granulation distribution can change. Pavements can also be
differentiated by their degree of striation [1].
2
There can be several types of distresses in road pavement surfaces. The first hint of
deterioration and the most common distress found are cracks [2]. A crack is a thin and long
road distress, characterized by its dark visual appearance. There exist several types of cracks,
with different severity levels. Longitudinal, transversal, miscellaneous and alligator are the main
crack types in road pavement surfaces, according to the former “Junta Autónoma das Estradas”
- JAE (see the Portuguese Distress Catalogue [3]). Examples of these types of cracks are
presented in Figure 2.
Figure 2 – Four images representing four different types of crack: (i) longitudinal crack (ii):
transversal crack, (iii) miscellaneous crack and (iv) alligator crack. The first three images belong
to the database used and the fourth was extracted from Google Image.
Whenever cracks start to show on road pavement surfaces, it is an indicator that the quality of
the pavement is degrading and maintenance is needed. By doing so, the road quality increases,
saving a considerably amount of money on restoration (comparing to a further distress
progression case). Besides cracks, others pavement distress types exist [4], but this dissertation
will be focused on cracks, since this is the most common road pavement distress and the first
type of road degradation to appear.
Several cameras can be placed in critical locations in order to constantly monitor the roads,
mostly in view of traffic surveillance, but that alone is not enough to supervise and keep the
road pavement quality high. Since cameras only cover a small part of the roads and may not
have enough resolution to allow the detection of cracks, an alternative is needed to gather
images and register the conditions of the pavement, typically involving a skilled technician
travelling along the road [5]. In case road surface images are captured, later on, the skilled
technicians have to analyze each of them and determine the existence of pavement distresses
and classify their type. This process is very time consuming, and requires a big effort to
manually analyze the full set of the acquired images [2]. Also it can happen that two inspectors
have different opinions when classifying the same distress [6].
To increase the speed and efficiency of road surface pavement analysis, several automatic
crack detectors have already been developed [7]. In the literature most of the automatic distress
detectors have the purpose to detect cracks and only a few to detect other types of distress,
3
since cracks are often the first distress to appear when roads start to degrade. So, if an
automatic solution is developed to detect pavement distresses such as cracks, as soon as they
appear, maintenance measures could be taken immediately, preventing a further degradation of
the pavement and keeping a higher road quality. Additionally, automatic solutions are less
expensive and more comfortable than traditional road pavement monitoring procedures and
would considerably reduce the effort required for analyzing the images manually.
All the existing automatic analysis techniques [8] present some limitations in detecting all cracks
for some images and, eventually they may detect a crack when there is none. A general
solution has not yet been found. However, several of the already developed techniques perform
well for specific types of road pavements [9]. The main drawbacks are due to a high
dependence on the road pavement texture and also on the image quality. Therefore
determining the pavement type is important to improve automatic crack detection results [1].
The weak representation of the signal (crack) to be detected, the weak contrast between the
pavement and the crack and the possibility that the texture of the road may hide the crack can
also hamper the task of detecting cracks. To minimize these limitations, automatic solutions
typically involve an image pre-processing stage. Pre-processing techniques aim to make the
image more uniform, without affecting the ability to identify the crack areas, and favoring the
contrast between the cracks and the pavement. By doing so, crack detection becomes easier
and faster.
Another critical issue to efficiently detect the presence of cracks is the set of features used to
describe the cracks in images, i.e., the crack properties chosen to help the search and detection
of cracks. The selection of features can be critical for the subsequent classification stage and,
therefore, for the overall system performance.
1.2 Purpose
This dissertation proposes an automatic system capable of detecting cracks from previously
capture road pavement images, with the intention of achieving a performance that can compete
with the best techniques reported in the literature. An objective evaluation methodology is also
adopted, providing quantitative evaluation results, thus easing the comparison against
alternative methods.
The proposed solution builds upon the techniques reported in the literature that produce the
best results. A solution focused on pattern recognition techniques is presented to detect and
classify cracks in images. Pattern recognition techniques typically involve the usage of
4
classifiers, which can operate in a supervised or unsupervised manner. The approach followed
in this dissertation is based on supervised learning, taking a set of selected road images
containing cracks as a training set, from which the system learns the crack characteristics. The
larger and richer the information contained in the training set, the more plausible is the system
to learn efficiently the cracks characteristics to identify them correctly later. To verify the system
performance in detecting cracks, after trained, a different set of road images, testing set, is
used. The system analyzes the testing set and marks the regions that correspond to cracks.
Later the system performance is determined by comparing the system results with the expected
ones, i.e., the so-called ground-truth information, which can be provided by manual image
analysis for a subset of the testing set. In this dissertation, support vector machines (SVM) are
adopted for the classification stage. In Figure 3 an example of an input image and the
respective ideal block based output expected by the system (ground-truth) is shown.
Figure 3 – Example of an input image (top) and the respective ideal block based output
expected by the system (bottom).
5
1.3 Contributions
The major contribution of this dissertation is the development of an automatic crack detection
system in road pavement surface images that achieves good results, when compared with the
literature results, using combinations of features commonly applied in the literature (mean and
variance), together with features not very frequently seen (third, fourth and fifth order moments)
as feature sets.
Since the pre-processing did not always improve the results, another contribution of this
dissertation is the use of specific pre-processing for specific features, i.e., the conjunction of
several different pre-processing with different features to train and test a system.
The best feature set tested provides a recall of 98.85%, a precision of 89.4% and a f-measure
of 93.09% for the first database considered, Road1, and a recall of 94.29%, a precision of
26.99% and a f-measure of 40.37% for the second database considered, Road2. In addition,
the best Road1 conjunction results .presents a recall of 99.04%, a precision of 94.09% and a f-
measure of 94.09%. In the literature Oliveira and Correia in [6] present a recall of 97% and a f-
measure of 94.7%. Lower but similar results are present in [10] with a recall of 96.3%, a
precision of 86.9% and a f-measure of 93.8%. A recall of 93.96%, a precision of 90.70% and a
f-measure of 91.95% is stated in [11]. A recall of 96.75% is achieved in the same paper. In [5]
the best recall was 95.44% for the first database and 85.44% for the second database. Although
other papers address this topic, the results are often of qualitative nature only, making those
results harder to compare. The output of such systems is often to classify each image as crack
or non-crack. Moreover, other papers distinguish several crack types not explicitly addressed in
this dissertation.
Analyzing the reported results, the best recall values for both databases used in this dissertation
can compete with the best recall results stated in the literature, while the best precision obtained
for the first database can also compete with the best precision results reported in the literature.
1.4 Structure
This dissertation has the following structure. Chapter 1 motivates the problem and describes the
main goal and structure of this dissertation. Chapter 2 presents the state of the art and the most
important techniques in the literature. A block diagram illustrating the main steps of an
automatic solution is presented, to clarify the type of techniques related to each of the automatic
crack detection stages. Chapter 3 describes the proposed system architecture. The pre-
processing techniques and the features selected that led to good results are first presented,
6
followed by a description of the training and testing stage of the SVM classifier. Finally the post-
processing used to improve the system performance is also described. In Chapter 4, the test
conditions for each database (Road1 and Road 2) are presented along with the results
achieved. Finally, Chapter 5 is reserved to not only compare the system results with other
similar techniques in the literature, extracting some conclusions but also to discuss possible
solutions (future work) that can improve the system overall accuracy.
7
2 State of the art
2.1 Introduction
Cracks are the first road deterioration sign and most common distress type found in road
pavement surfaces. They appear as thin, continuous and dark structures which can be visually
detected and distinguished from the road texture.
Crack detection plays an important role in the maintenance of road networks. During several
decades, pavement surface distress was monitored by visual inspection, representing a costly
and time consuming task [1] and [6]. To increase the speed and efficiency of road surface
pavement analysis, several automatic crack detectors have already been developed.
Advanced systems can acquire road images more rapidly, in a safer way and with better quality
than traditional manual annotation [12]. For that a camera is typically attached to the inspection
vehicle. Once the database is totally acquired the images can be analyzed offline by an
automatic crack detector. Figure 4, taken from [9] shows the LRIS system composed by two
high resolution linescan cameras together with high power lasers.
Figure 4 – LRIS system
The current Chapter does not focus on image acquisition but rather addresses previous works
on crack detection, emphasizing the main approaches reported in the literature, their strong
points and weaknesses. Figure 5 shows the general crack detection system architecture, where
represents an input image, the pre-processed image, the feature vector,
the label attributed (crack or non-crack) and the crack type assigned to eventually detected
cracks.
8
Pre-
processing
Feature
extraction
Crack
detection
Crack
classification
I
𝑛 J
{0,1}
Figure 5 – General architecture of an automatic crack detection system
The architecture presented in Figure 5 includes four main steps: i) pre-processing, ii) feature
extraction iii) crack detection and iv) crack classification. In the first stage, the input image is
filtered to remove noise and to enhance crack visual features. Then, the selected features are
extracted from the pre-processed images. Based on the computed feature values, each image
pixel (or each image pixel block) is classified as containing cracks or not, by the crack detection
algorithm. Finally, the detected cracks can be classified according to their geometric properties.
Some papers only care about detecting the cracks in each image, while others are solely
interested in evaluating the crack type. Only a limited number of papers in the literature
implement the whole architecture presented in Figure 5.
In the following sections the most relevant and interesting techniques presented in the literature,
that are involved in each of the considered stages, are described. In this dissertation the papers
that only distinguish the several crack types are discussed in the crack detection section. The
crack classification block is relevant only for solutions that use both crack detection, to detect
cracks, and crack classification, to label the type of each already detected crack.
2.2 Pre-processing
During the acquisition process, using a photographic or a video camera, the image often
becomes corrupted by random noise (e.g., camera noise) which hampers the detection of road
distresses. In addition, the illumination conditions may change for different image locations,
resulting in a road image that is not homogenous. Illumination changes between consecutive
images may also occur. Moreover, a road pavement surface image often shows a noticeable
non-uniform texture which can increase the difficulty in detecting road distresses. Shadows, tire
marks or oil stains can also interfere with the automatic crack detection procedure.
The role of the pre-processing step consists of removing, as much as possible, the noise and
smoothing the road texture, while keeping the ability to identify eventually existing cracks.
Depending on the pre-processing techniques selected, the overall crack detection results can
be considerably improved, speeding up also the image processing.
9
A wide variety of pre-processing methods have been used in the literature. For instance Oliveira
and Correia in [6] apply a normalization technique to reduce non-uniform background
illumination. For that purpose, a mean value matrix (where each element is the average value of
each image block) along with a preliminary classification of crack pixels based on their grey
level is applied, in order to equalize the average of the regions preliminarily labeled as non-
crack, maintaining the average intensity of the regions labeled as cracks. A region saturation
algorithm (top-hat) is also proposed in [6] to reduce the influence of white pixels that can lead to
standard deviation values similar to what is observed in blocks of pixels containing cracks,
therefore hampering the system performance when using such feature.
One of the methods discussed by Chambon and Moliard called morph in [13] presents pre-
processing techniques which include erosion, conditional median filter, conditional mean filter
and histogram equalization to reduce noise, producing a more uniform image and increasing the
contrast between cracks and road pavement. Median filters and grey-scale morphological filters
[14] can also be used to separate cracks from the background.
Other methods take into account the fact that cracks often correspond to abrupt changes of the
image intensity surface. For instance Gavilán et al applies in [1] a histogram technique, together
with a sliding window technique with a determined size, step and threshold to smooth the
texture, reducing noise and enhancing the crack features. Anisotropic diffusion filtering to
smooth the image texture variation is used by Oliveira and Correia in [10]. Besides smoothing,
this technique can also be applied for restoration purposes. A partial differential equation (PDE),
to smooth the image texture and enhance the cracks, is present in [15]. In [4] a PDE technique
is used for image segmentation.
A shadow-removal technique without affecting the crack pixels is presented in [16]. It consists of
four steps. The first one is a grey-scale morphological close operation for crack removal in the
image to ease the shadow area identification. Then a 2D Gaussian filter is applied to smooth
the texture and increasing also the shadow area identification. The third step consists on
creating N geodesic levels. Each geodesic level contains all the pixels between two grey-level
values in a way that every geodesic level has a similar number of pixels when compared to the
others. Then the first L low intensity levels will be part of the shadow region while the remaining
levels will be part of the non-shadow region. This value was empirically set by the authors. After
distinguish the shadow region from the non-shadow region, the last step consists on applying
the following equation to eliminate the shadow and get a more uniform image.
. if (i,j) S
if (i,j)
ij
ij
ij B
(1)
10
where
, i.e., the ratio between the intensity standard deviation of the non-shadowed
region B (DB), and the shadow region S (DS) respectively, and , where is the
average intensity of region B and the average intensity of region S.
Figure 6 – Example of an input image (top) and the pre-processed image with histogram
equalization (bottom) [13].
In Figure 6 an example of a pre-processed image with histogram equalization taken from [13] is
presented. The non-uniform illumination is considerably removed, producing a more uniform
image without affecting much the crack pixels.
The road pavement type, the image quality, the camera noise, the illumination and several
artifacts that can hamper crack detection may influence the pre-processing technique choice. In
particular, removing road pavement texture and correcting non-uniform background illumination
are two frequent pre-processing tasks which improve the crack detection performance. From the
techniques briefly presented above, mean and median filtering are two simple and fast
strategies that can be selected to make the image more uniform. However, depending on the
filter size selected, these techniques have the handicap of also erasing crack evidence. A more
complex technique capable of effectively reducing the intensity variance without hampering the
crack features of the database, used in [10], is anisotropic diffusion filtering. However, this
technique has the difficulty of selecting the right conduction coefficient and the number of
iterations. PDE techniques can detect any grey level transaction of adjacent pixels, being
capable of detecting cracks efficiently. However this technique is very sensitive to noise and
requires a technique to eliminate it first [15]. Histogram equalization is an interesting choice
since it has the property of removing non-uniform illumination without affecting the crack
features. Top-hat is another interesting operation since it is of simple use and it produces a
more uniform image with the advantage of not hampering the crack pixels, while smoothing the
11
texture of the image. Despite anisotropic diffusion and PDE approaches being effective, their
implementation is somewhat complex and these are computationally time consuming
techniques. Therefore mean and median filters as well as top-hat and histogram equalization
are interesting pre-processed techniques to explore due to their simplicity, speed and efficiency.
2.3 Feature extraction
After the pre-processing block, the next step in the proposed system architecture (Figure 5) is
feature extraction. Depending on the features quality, i.e. the ability to distinguish crack features
from non-crack features, the overall system performance can change drastically, thus requiring
a special attention.
The pre-processing block can contribute to improve the feature quality since it can enhance the
cracks from the background. For instance, if all the crack pixels are represented by low grey
level while the non-crack pixels are represented by a much higher intensity after pre-processing,
the distinction of both classes can be easier, thus increasing the feature quality and affecting
the system performance positively. The capacity of removing noise can also improve the feature
quality. Besides pre-processing, the feature selection can significantly contribute to the feature
quality extracted as well.
Just like the pre-processing techniques, there are several features that can be chosen. Some
interesting features and those most commonly reported for crack detection are described in this
section.
Cracks have a number of properties that can be exploited to discriminate them from non-crack
features. Notably, they have photometric characteristics (dark pixels), as well as geometric
(elongated continuous structures) and frequency properties (they correspond to sudden
transitions in the image, thus being associated with high frequencies), that can be explored by
crack detection algorithms.
To exploit the above properties, two main approaches to extract features, and therefore to
detect cracks can be distinguished in the literature: i) pixel-based methods and ii) block-based
methods. The first approach aims to segment the image into two sets: the foreground (cracks)
and the background (non-cracks), by classifying each image pixel based on its properties (e.g.,
intensity). The second approach splits the image into a set of (often non-overlapping) blocks
with the purpose of extracting features from each block. A supervised learning algorithm can
then be trained (e.g., a neural network) to discriminate crack from non-crack blocks. For both
approaches, a description of the most common type of features applied in each approach is
presented in the sequel.
12
2.3.1 Pixel-based
The pixel-based methods focus on several road surface crack properties, e.g. photometric and
geometric properties. The technique reported in [7] uses photometric properties such as the
pixel grey level as crack features. Tanaka and Uematsu, in [17], use first the photometric
property pixel grey level and then, in a pixel neighborhood, the photometric properties mean
grey level and geometric property local variance. Chambon and Moliard use photometric
features like grey level pixel and geometric characteristics such as length and width of the crack
in the method morph introduced in [13]. Statistical features like mean and standard deviation
are used by Cheng et al in [18] and by Nguyen et al in [8].
Other papers use as features, a set of crack frequency properties. For instance in [19] a Sobel
edge detector is used. Wavelet coefficients can be computed for a given scale or for several
scales being subsequently merged. Subirat et al in [20] apply these two procedures for a
continuous 1D wavelet transform. A 2D continuous wavelet transform with several scales is
used in [21]. Chambon and Moliard in [13] present a second technique called GaMM that uses
wavelet coefficients as features. Contourlet coefficients are used in [22]. In [23] a non
subsample contourlet transform (NSCT) is used for image decomposition to extract the
coefficient in different scales and different directions. A novel segmentation technique that
typically operates at a specific frequency and orientation is presented in [24] based on Gabor
filters.
Generally, photometric and geometric features are capable of detecting part of the crack region
but also a lot of unwanted noise, as observed in [7] and [17]. However, these features can
achieve better results like presented in [13] using the morph technique. More complex but
effective feature is the wavelet coefficient. For instance the techniques GaMM and morph are
compared in [13], where morph achieves more true positives but GAMM achieves less false
positives (less non crack pixels detect as crack pixels). Other papers using wavelets, such as
[20] and [21] can detect the crack regions well, with the handicap of presenting also some
unwanted noise. Mean and standard deviation also proved to be promising features. The
method presented in [8] (using mean and standard deviation as features) is compared to
Subirats et al method in [25], being capable of detecting only the crack pixels, while Subirats
detects also some noise.
13
2.3.2 Block-based
Block-based methods split the image into squared blocks, extracting a set of features from each
block. Several features have been adopted for this purpose.
Mean and standard deviation of the image intensity in each block are two features that can be
used simultaneously. These two features are applied by Oliveira and Correia in [6] as crack
features, building a two dimensional feature space where each point corresponds to the mean
value and standard deviation of each block. A binary classifier labels each block as containing
cracks or not in [26], using a density feature, followed by a proximity feature and a fractal
dimension feature. A grid cell of 8x8 pixels is used in [27], each cell being labeled as containing
a crack, or not, according to the grey level of the border pixels. Rosa and Correia applied in [5]
the features dynamic range, minimum intensity pixel and standard deviation. Average value and
minimum value intensity are the features used in [28]. In [29] each block is labeled as crack or
non-crack block, depending on the crack pixels percentage.
Mean and standard deviation are two commonly features used in block-based approach,
leading to good results. In [6] the best results are achieved for a parametric learning algorithm
with a f-measure of 94.7% while the best f-measure stated in [11] is 91.95%. Some papers
using these features in the pixel based approach, also state good results. For instance, 97.5%
of the images containing cracks are classified as such in [8] and good results are reported in
[18]. In [5] the best recall presented using the minimum intensity pixel is 95.44% and 95.02%
using standard deviation together with the minimum intensity pixel. These three features (mean,
standard deviation and minimum intensity pixel) show good results and produce one of the
highest results achieved in the literature, being a good starting point when developing a new
system.
2.4 Crack detection
The crack detection block uses the extracted features in order to detect cracks in road surface
images. Depending on their quality, different results can be achieved.
The current section, just like section 2.3, is divided into two subsections. One presenting the
approaches typically used in the pixel-based methods and another presenting the approaches
used in the block-based methods. Both approaches are described in the sequel.
14
2.4.1 Pixel-based methods
In the pixel-based approach, images are analyzed by methods that make crack detection
decisions for each individual image pixel. Most of the papers that follow this approach are based
on a pixel comparison of the image feature with a threshold, which is typically followed by some
kind of post-processing that enforces space continuity.
The simplest classification method using the pixel-based approach is thresholding. It is often
used for deciding which pixels in an image correspond to cracks [13] using the pixel intensity as
feature. Since crack pixels are often the darkest ones, all image pixels can be compared with,
e.g. a pre-defined threshold. Every image pixel that has its intensity lower than the threshold is
considered a crack pixel and receives the corresponding classification label, e.g., 1 (crack) and
0 (non-crack) otherwise. The thresholding operation can be stated as follows:
1 if l(x) < T
( )0 otherwise
L x
(2)
Where denotes the threshold, I(x) the feature value at position x and L(x) the binary label
assigned by the classification algorithm.
In Figure 7 an application of this method is illustrated. The original image is shown on the left
and the thresholding output is represented on the right.
Figure 7 – Thresholding example. The original image is shown on the left and the thresholding
output is represented on the right.
As observed from Figure 7, thresholding has two major drawbacks. First, it leads to many false
positives i.e., pixels that are marked as belonging to cracks when they are not. This problem
can be partially alleviated using post-processing techniques, such as morphological operators
[14] capable of eliminating the most obvious errors, e.g., isolated pixels. Morphological tools
and top-hat operations are also used after thresholding the image for the same purpose in [17].
15
In [20] a noise removal post-processing based on morphological tools is also applied. A more
sophisticated way to enforce space continuity is based on the use of Markov random fields
presented by Chambon and Moliard in [13]. This model tries to connect local crack regions with
their respective neighbors, based on the comparison of their orientation and distance.
The second difficulty concerns the choice of the threshold. Classic automatic threshold
detection strategies, such as the Otsu method [30], do not perform well on road surface images
because the image histogram often has a single mode and it is not possible to find a meaningful
threshold separating the histogram into two modes. The influence of the crack pixels in the
histogram is negligible because they are considerably less compared to the non-crack pixels.
Other algorithms have been proposed with emphasis on the use of adaptive local histograms in
which the crack pixels may have larger influence [31].
Another problem with these techniques when thresholding the image is the fact that the spatial
organization of cracks is not often considered. The lack of capacity in dissociating crack from
others artifacts that have also a low grey level, like shadows, tire marks or oil stains is also
another limitation that can be appointed.
2.4.2 Block-based methods
The block-based approach consists of dividing the image into a set of typically non-overlapping
blocks. Since the classifiers used in this method typically employ supervised learning
techniques, a ground-truth containing a set of images segmented by an expert, where each
block receives a binary label (crack or non-crack) is assumed to be known. Then, the classifier
is trained using part of the features (e.g., statistical properties) extracted, from the blocks, and
tested afterwards with the remaining part of the features to predict the expert labels, to measure
the system performance with the help of the ground-truth.
Several classifiers have been reported in the literature to perform such a task, notably: i) neural
networks (NN) [32], ii) K-Nearest Neighbor (KNN) classifiers [6], iii) Adaboost [5] or iv) Support
Vector Machines (SVM) [26]. The following sections will briefly describe each of the four
techniques with a special attention on SVM.
2.4.2.1 Neural Networks
Neural Networks (NN) is one of the machine learning techniques used in the literature. The
relationship between the input and the output is typically non-linear and depends on a large
number of coefficients (weights) which must be learned from the data. Back propagation neural
network (BPNN), a feed-forward multi-layer network [33], is usually composed of three layers.
16
Each layer can be composed of several nodes. The first layer usually represents the input layer
and has as many nodes as the number of features being used for crack detection. The second
layer is a hidden layer and the third layer is the output layer, typically representing the class
attributed to the input. Another characteristic of the BPNN is the ability of the system output to
become closer to the desired output, through an adequate weight adjustment. Despite the ability
to classify correctly noisy data, being the NN major advantage [29], this technique also presents
some limitations, namely, slow speed of convergence [26] during the learning phase and the
need of many good samples to train the system properly [29].
Li et al in [26], compare two learning algorithms, namely BPNN and SVM, to label each image
block (40x40 pixels) with one of five possible crack types, respectively longitudinal crack,
transversal crack, alligator crack, block crack or no crack. The training set is composed of 450
images (90 images of each type are considered) and the testing set of 305 images (90
longitudinal cracks, 55 transversal cracks, 70 alligator cracks, 50 block crack and 40 no crack).
The SVM parameters are computed through a genetic algorithm while the BPNN is compose of
15 nodes in the hidden layer with a learning rate of 0.01 The final results have shown that SVM
was more accurate than BPNN for all the training sizes considered (100, 200, 300 and 450
images) and were always faster than BPNN as well. The best results of the SVM and BPNN
were for the training set of 450 images, achieving a classification rate of the correct crack type
of 78.4% and 69.6%, respectively.
Three neural network techniques (Image-based Neural Network, Histogram-based Neural
Network, and Proximity-based Neural Network) are used to detect four crack types namely
longitudinal, transversal, alligator and block cracks in [29]. The best result was produced by
Proximity-based Neural Network with a recall of 95.2%. For each of the three techniques a
40x40 block size was used. A database of 450 artificial images (90 images of each crack type
and 90 with no cracks) was generated. All the neural networks were trained with 300 of artificial
images and tested with 124 actual pavement images and 150 artificial images. Several hidden
nodes (30, 60, 90, 120 and 150), learning coefficients (0.1, 0.05, 0.01 and 0.001) and training
epochs (500, 1000, 1500 and 2500) were explored to find an optimal architecture. For all the
three neural networks used the best results were achieved for 60 hidden layers with a learning
coefficient of 0.01 for 1500 epochs.
Block crack is a new crack type introduced in this dissertation. Despite being similar to the crack
type alligator, this new type is presented in [29] as a development of the transverse crack,
showing some rectangular patterns. In Figure 8 an example extracted from the Google image
section is presented.
17
Figure 8 – Example of a block crack type.
2.4.2.2 K-Nearest Neighbor Classifier
The K-nearest neighbor classifier is a non-parametric machine learning algorithm [6] which
labels each test sample, taking into account the class of the closest training samples. The most
voted class among the classes of the k nearest neighbor dictates the class attributed to the test
sample [33]. The purpose of the training set is to supply not only labeled samples with the
respective class (e.g. using the ground-truth) but also to find the k neighbors [34] (typically using
a small value for k) that supplies the best system accuracy rate [33]. To minimize the influence
of the neighbors that belong to the most frequent class, a possibility is weighting them according
to their distance. The further they are from the test sample, the less relevant is their contribution
to determine the pixel class. That way the class that has a larger number does not impose itself
due to its size.
Oliveira and Correia apply a 1-KNN (one nearest neighbor) in [6] for crack detection together
with an estimated posterior probability density functions, achieving a recall of 94.6%. The
database used has images with several crack types namely longitudinal, transversal and
miscellaneous but also images without cracks. Two different resolutions are referred, namely
2048x1536 and 1858x1384, being the block size chosen 75x75 pixels. Three pattern
recognition algorithms are compared in [33], namely, KNN, artificial neural networks (ANN) and
SVM for eggshell crack detection. The features used for the three techniques were the same,
selecting the internal parameter set for each technique that leads to the best performance. The
results showed that SVM detected 97.1% of the correct identification rate (attribute the right
label to each egg), 92.1% for the neural network and 88.9% for the KNN.
18
2.4.2.3 Adaboost
Adaboost is a learning algorithm based on boosting, capable of building a strong classifier with
high accuracy rate through the combination of several weak classifiers [35]. The weak
classifiers are iteratively trained and the best weak classifier (the one who produces the best
results) is selected in each iteration. Then, the weight of the mislearned data is increased while
the weight of the data well classified is decreased [5]. That way if a weak classifier classifies
well the mislearned data in the previous iteration, will have a higher weight than the other
classifiers, being chosen to be part of the final classifier, and leading the final classifier to a
strong and versatile one, as stated in [36].
Three kinds of Adaboost classifiers are experimented in [5], namely, Modest Adaboost, Gentle
Adaboost and Real Adaboost, being the Modest Adaboost the algorithm chosen since it
converges faster and provides a better system overall performance than the other two. The
number of iterations for each training set was 100, since it can achieve the minimum error. The
images of the two different databases used consist of grey scale with low texture variation,
being divided each image in blocks of 64x64 pixels. The best result achieved for the first
database was a recall of 95.44% with a crack type classification (longitudinal, transversal and
miscellaneous) 100% correct, for a training set of 25% of the respective database images. The
best results of the second database, that has harder images to analyze, achieved a recall of
85.44% with a crack classification correction of 100% also for a training set of 25% of the
respective database images.
2.4.2.4 Support Vector Machines
This subsection addresses support vector machines (SVM), the classifier used in this
dissertation for the detection of road surface cracks. An SVM classifier, just like other learning
algorithms, is composed by training and testing stages [2]. In the training stage the selected
features are extracted and typically mapped into a higher dimensional space in order to
efficiently separate crack features from non-crack features. Since the ground-truth of the
training set is supplied, the features that correspond to cracks and to non-cracks can be
determined. Then, as illustrated in Figure 9, SVM selects the set of points in each class (support
vectors) that are the nearest to the other class and through them computes a hyperplane that
separates the two classes, being as far as possible from the support vectors. This hyperplane is
often called maximum-margin hyperplane and makes SVM robust.
Once the system has been trained, the following phase is the testing stage. In this stage each
testing sample is classified as belonging to one of the two pattern classes. For that the testing
set features are mapped into the same dimensional space produced in the training stage and,
19
according to the hyperplane side they fall, the corresponding pattern class is attributed. Finally,
the classifier accuracy can be evaluated by comparison against a set of manually labeled data.
Figure 9 – SVM feature space example that selects the support vectors to separate the two
pattern classes through a hyperplane taken from [37].
The example illustrated in Figure 9 is very simple and there is no need to map the extracted
features into a higher dimensional space since they can be easily separated by a hyperplane.
However the typical case is much more complex and the two classes are often mixed, being
necessary to map the features to separate better the two classes.
Note that in Figure 9 several different hyperplanes could separate the two classes. However the
hyperplane computed, tries to be as far as possible from the support vectors.
The use of an SVM classifier involves solving an optimization problem [26] and to optimize a
regularization parameter C that defines the cost associated to misclassified data ( ) and
influences the model complexity [33]. Other parameters may need to be optimized but that
depends on the kernel function selected. The kernel function selection is a critical decision [33],
since it is the function responsible for the features mapping typically to a higher dimensional
space. The reason for mapping the features is due to the difficulty in separating the two pattern
classes in the original space.
The kernel function can be described as K(x,x’)= ϕ(x).ϕ(x
’), i.e., the dot product between ϕ(x)
and ϕ(x’) [38]. There exist several functions ϕ that originate several different kernels. The most
commonly used kernels are:
Linear: K(xi,xj)= xiTxj
Polynomial: K(xi,xj)=(ϒxiTxj+r)
d, ϒ>0
Radial basis function (RBF): K(xi,xj)= exp(-ϒ||xi - xj||2), ϒ>0
Sigmoid: K(xi,xj)= tanh(ϒxiTxj+r)
Where ϒ, r and d correspond to kernel parameters that can be defined or estimated.
20
Among the kernels listed above, RBF is the one typically recommended to start with. First of all
RBF can map the original space into a higher dimensional space in a nonlinear way, as it can
be seen in the kernel expression above. This is an advantage if the relation between the class
label and feature vectors is nonlinear. For this case, the linear kernel would not be a proper
choice due to the nonlinear relation. The linear kernel can only produce a linear feature space
through the feature vectors. Also, depending on the parameters selected, the performance of
both the RBF kernel and the linear kernel could be the same [38]. The sigmoid kernel has, in
one way, the handicap of being invalid for some parameter values, and in the other way
provides similar results to those of RBF for other different sets of parameters. Polynomial
kernels just like the sigmoid kernels have more parameters to estimate than RBF, which
requires a higher computational effort.
The advantage of RBF thus relies on presenting fewer numerical difficulties when compared to
the other non-linear kernels. However in case of a large number of features and a small number
of training samples the use of the linear kernel may be better than the RBF.
The SVM optimization problem is described as follows:
2
,
T
, 12
Subject to:
0
1
w
min
l
iw b i
ii
i
i
w
y
C
x b
(3)
Where w is the normal vector, C is the regularization parameter, is the error of the
misclassified data, the label attributed to each pattern (e.g. -1 for non-crack and 1 for crack),
the function ɸ depends on the kernel function used, the feature point and b the offset of the
hyperplane.
As observed from Figure 9, the distance of the two hyperplanes passing the support vectors of
the two different patterns is greater, the smaller the norm of the normal vector w. Since SVM
computes the hyperplane that is as far away as possible from the support vector it means that
the minimization of the norm of the normal vector is the key to solve the optimization problem.
The second term of the optimization problem is for the cases that the hyperplane cannot fully
separate the two pattern classes (e.g. when they are too mixed) and also to reduce the
influence of potential outliers. Therefore, to separate correctly as much as possible the two
classes, the term is introduced to achieve the minimum error. The parameter C is a constant
and establishes the influence of the .This approach is also known as the soft margin.
21
The condition Tw 1i i iy x b forces the training pattern to be higher than 1 i
when iy is 1 and below )(1 i when iy is -1, therefore establishing a decision boundary that
separates the two classes.
The SVM classifier performs a binary classification of the data. However SVM can be
transformed into a multiclass classifier. Typically two kinds of multiclass methods can be used
for this purpose. One of them is based on one against one (OAO) technique. It consists of
creating N(N-1)/2 classifiers where N is the number of pattern classes. Then each classifier is
composed of a different pair of classes. After that each classifier will vote for one pattern class
of the testing sample and the class having most votes wins, being labeled with the according
class [26]. OAO multiclass method is used in [1] to select one of ten different surface classes
and to classify the several types of cracks in [26].
The other multiclass method is one against all (OAA). This technique is not very frequently used
in crack detection problems. Basically there are as many classifiers as classes, being the
classifier that produced the highest output the one that labels the pattern class of the testing
sample.
As stated before, a few papers using SVM to detect pavement distress can be found in the
literature, showing better results than alternative methods [33] [26] against which they were
compared. For that reason it is expected that using an SVM classifier in this dissertation it will
be possible to achieve good results.
The structure of the SVM classifier developed is presented in Chapter 3 and the corresponding
classification results are reported in Chapter 4.
2.5 Crack classification
After performing crack detection, the final stage of the proposed architecture is crack
classification, aiming to assign a crack type to each previously detected crack. Although some
papers address this stage, they do not perform it after detecting the cracks. They simply label
the crack type for each image without having detected the cracks automatically first. Therefore
those papers were described in the previous section. This section addresses the papers that
use these two blocks simultaneously in their system.
Several types of cracks can be distinguished in road pavement images. According to JAE [3],
cracks can be divided into transversal, longitudinal, miscellaneous and alligator cracks. Other
22
papers consider a fifth different crack type designated as block crack. Some examples of these
types can be observed in Figure 2 and Figure 8, respectively.
Automatic monitoring systems should be able not only to detect cracks but also to classify them
according to their type. This can be done by extracting a set of features of the blocks labeled as
crack by the classifier.
In [6] three different types of cracks are considered, namely, longitudinal, transversal and
miscellaneous. A 2D feature set, based on the standard deviation of crack pixel coordinates is
considered, i.e., for each detected crack the standard deviation of the crack block row
coordinates and the crack block column coordinates are computed. These features can be
visualized using two orthogonal axes. The vertical axis represents the standard deviation of the
crack row coordinates and the horizontal axis the standard deviation of the crack column
coordinates, being the crack represented by a point in this space. In this 2D space a bisectrix
line is also considered. To classify each crack type the following classification rules are used:
If the distance to the axis is less than the distance to the bisectrix line and the nearest
axis is the horizontal one than the crack type is longitudinal;
If the distance to the axis is less than the distance to the bisectrix line and the nearest
axis is the vertical one than the crack type is transversal;
If the distance to any axis is more than the distance to the bisectrix line than the crack
type is miscellaneous.
Figure 10 is extracted from [6] and shows the 2D space for crack classification:
Figure 10 – 2D standard deviation for crack classification with an example of a longitudinal
crack represented by the point L1 taken from [6].
23
Points with the same row standard deviation value and column standard deviation value belong
to the bisectrix, representing perfect miscellaneous cracks. Perfect transversal cracks and
perfect longitudinal cracks are represented by points over the horizontal and vertical axis,
respectively. In the case of the point L1 (Figure 10), it is computed first the minimum distance
between L1 and the vertical axis and L1 and the horizontal axis (designated by dA). Then the
minimum distance between the point L1 and the bisectrix (designated by dL) is computed.
Finally the minimum value these two distances (dA and dL) dictates the crack type. In this
particular example since the point is closer to the vertical axis than the bisectrix the crack is
labeled as a longitudinal crack.
Oliveira and Correia in [6] achieved a crack type classification of 100%. Rosa and Correia in [5]
also use the same approach presented above after labeling the image blocks, since the crack
types were the same (transversal, longitudinal and miscellaneous). The best classification rate
obtained was 100% for the two databases used.
24
25
3 Proposed System
This Chapter presents the system developed in this dissertation for the detection of cracks in
road images and discusses the techniques involved in each processing step.
3.1 System architecture
The block diagram of the proposed system is shown in Figure 11. The system comprises four
processing blocks. The input image is pre-processed to enhance the contrast between the
crack and the pavement, speeding up the process of image computation and facilitating crack
detection. Then the pre-processed image, , is split into non-overlapping blocks (75x75 pixels),
. Each block can then be characterized by a vector of features, , which describes
the statistical properties of the block. A binary classifier is then used to assign a label,
to each block , depending on whether it contains crack pixels, , or not, .
Figure 11 – Block diagram of the proposed crack detection system.
After classifying each block according to the presence of cracks, a post-processing operation is
performed to correct unusual label configurations, e.g., isolated blocks classified as cracks
surrounded by non-crack blocks. A final module considering each connected crack region
detected, classifying it according to the considered crack types, notably longitudinal, transversal
or miscellaneous was not included in this architecture due to time restriction.
The following sections describe the databases used in this dissertation and each of the four
processing stages.
3.2 Databases
The current section aims to present relevant information about the databases used in this
dissertation.
Crack
detection
Post-
processing
Pre-
processing
Feature
extraction
26
Two databases with two different road pavement types have been considered in this
dissertation. Both databases are composed by asphalt but present relatively different road
pavement texture as illustrated in Figure 12. For instance, the grey level presented in both
database is different leading to two distinct pavement types.
Road pavement types can differ in many ways from each other based on their texture
composition, like presented in Figure 1. Not only can the grey level vary from one pavement to
another but also the granulation size as well as the granulation distribution.
Figure 12 – Example of two images of the two different databases considered
Both databases are composed by grey-scale images, one being acquired with a digital camera
during a human observation survey (the left image of Figure 12) while the other was acquired
through a LRIS system [9] (the right image of Figure 12). The two databases contain not only
images with several types of cracks, namely, longitudinal, transversal and miscellaneous (see
the first three images of Figure 2) but also images with no cracks. In general all images of the
two databases were taken, with the optical axis of the camera orthogonal to the pavement
where each pixel corresponds approximately to 1 mm2. Both image databases are affected by
non-uniform background illumination and texture noise produced by the road surface.
The first database, Road1, has 49 images with cracks and 7 with no cracks, while the second
database, Road2, has 87 images with cracks and 78 with no cracks. For the two databases
considered, each block has been manually labeled as containing crack pixels, or not, by a
specialist, providing a ground-truth classification that can be used for classifier training as well
as for classifier evaluation purposes. The resolution in each database is also different. Road1
has a resolution of 2048x1536 and Road 2 a resolution of 2048x4096. Since both resolutions
are high and one of the problems related to image processing is computation time and memory
storage, each image is divided into blocks of size 75x75 pixels, allowing a faster computation
[29] and lower memory storage requirements. Bigger blocks negatively influence the accuracy
rate of the crack detectors, while smaller blocks would increase the computation time. The
selected size is a good compromise between these two constraints.
27
3.3 Pre-processing
This section focuses on the pre-processing techniques used in the proposed crack detection
system, to more easily differentiate crack blocks from non-crack blocks, thus contributing to
improve the overall system accuracy.
Since the database images do not contain shadows [16] or large regions with high pixel
intensity like light halo [13], the pre-processing stage can be simplified. Nevertheless, the
images exhibit different visual properties caused by non-uniform illumination and non-uniform
road surface texture. Removing the influence of such factors, while preserving information about
the presence of cracks, will facilitate crack detection and improve the robustness of the system.
In the previous Chapter some techniques that could lighten these kinds of artifacts, were
presented. Based on the discussion made, and in face of the observed characteristics of the
image databases to be used in this dissertation, the pre-processing techniques selected that
seemed appropriate are: i) top-hat filter, ii) mean-filter, iii) min-filter and iv) adaptive histogram
equalization.
Figure 13 shows the three distinguished pre-processing configurations considered in the
developed system: i) top-hat; ii) mean-filter followed by top-hat and iii) min-filter followed by
adaptive histogram equalization. Top-hat is a filter that eliminates relevant high-intensity noise
without affecting the crack blocks, being also capable of getting a more uniform image. For that
reason it was one of the configurations chosen. Mean-filter followed by top-hat is the second
configuration selected for the developed system. The mean-filter can be used to eliminate most
of the non-uniform texture and most of the non-uniform illumination. The mean filter is applied
first to guaranty a more uniform background, without jeopardizing the contrast between the
background and the crack pixels. The last pre-processing configuration seems promising, since
it reinforces first the crack pixels through a min-filter technique and then an adaptive histogram
equalization technique is applied, to enhance the contrast between the cracks and the
pavement.
28
Image Database
Top-hatMean filter
and Top-hat
Min filter and
AHE
J1 J2 J3
Figure 13 – Presentation of the several pre-processing configuration applied to the image
database, namely top-hat, mean filter followed by top-hat and min-filter followed by adaptive
histogram equalization (AHE).
The three configurations use techniques that depend on parameters that must be pre-defined.
The top-hat filter sets a maximum grey value that can be defined as it can be seen in Figure 14
(extracted from [6]). The selection of this parameter is based on a value that would not damage
the system overall accuracy (non-crack blocks classified as crack blocks) but could get a more
uniform image. Based on the mean histogram of the non-crack blocks and crack blocks of
Road1 (Figure 15), the grey value that could satisfy these two conditions is empirically set as
150. Since the non-crack histogram slope starts to decreases more rapidly than the crack
histogram slope beyond 150, the percentage of crack blocks that have a high average value is
higher than the percentage of non-crack blocks. Therefore the value 150 has a bigger impact on
the crack blocks average value than on the non-crack blocks average. A lower value would not
bring any benefit since the slope of the two histograms is similar and the two block types would
start to be very hard to distinguish from each other. Higher grey values the image would be less
uniform and the number of the crack blocks with a high average would be considerable.
The mean filter parameter relies on a pre-defined mask size. Since crack pixels are very rare,
they are influenced by the grey level of the non-crack pixels (typically of much higher intensity)
around them. Therefore the filter size selection must be small enough for not hampering the
crack pixels. Based on this, the adopted size was a mask of 3x3 pixels.
The min-filter parameter was also a 3x3 mask. Since the min-filter chooses the minimum value
within the mask for each pixel a bigger matrix would darken too much not only the crack area
but also non-crack areas due to possible isolated pixels with low grey levels. The adopted size
is a good compromise between reinforcing the crack pixels and preventing other non-crack
areas to be misclassified later by the classifier, compromising the system overall accuracy.
29
Figure 14 – Top- hat filter representation. All the image pixel intensities above api will have a
grey level equal to api.
Figure 15 – Mean histogram of the cracks (left) and non-cracks (right) of Road1.
An example of each configuration with the parameters selected for each technique (the same as
stated before in this section) is shown in Figure 16.
30
Figure 16 – Example of the three pre-processing combinations. Top-left is the original image,
top-right the top-hat image, bottom-left mean filter followed by top-hat and bottom-right min-filter
followed by adaptive histogram equalization.
3.4 Feature extraction
Based on the discussion made in the previous Chapter and the discussion on the features that
lead to promising results, in the block-based sub-section, the set of features selected to
characterize each block are the following ones:
Minimum value
Mean intensity
Variance
Higher order moments (3rd
, 4th and 5
th order)
Each of these features will be discussed in the sequel.
31
3.4.1 Features selected for road crack detection
Since crack pixels correspond to low intensity values, one feature that is capable of
distinguishing crack blocks from non-crack blocks is the minimum intensity pixel (mip) [5] value
within an image block:
( )
xmin (x)
iimip B (4)
where denotes the image intensity at position in the i-th block.
Mean is the second feature used for crack detection. This feature describes the crack average
intensity within a block [6] and differentiates crack from non-crack blocks since the more crack
evidence the block has the lower is the block mean. The mean value is computed as follows:
( )
2x
1( )
ii B xN
(5)
where N2 is the block size.
The third feature considered is the second order statistic (variance), which has also already
been used to detect cracks before [6]. Actually in [6], the feature used is standard deviation, but
since the standard deviation is the square root of the variance the information extracted from
each block is essentially the same.
Assuming the mean value of each block is similar to each other, and since there are much more
non-crack samples than crack samples but the crack grey level is much lower, the crack blocks
variance should be higher when compared to non-crack blocks variance. The expression that
describes this feature for block is described as follows:
( )
2 ( ) 2
2x
1( )
i i iB xN
(6)
This approach can be extended to higher order statistics. Notably, the central moment of order
is defined as follows:
( ) ( )
2x
1( )
ii i k
k B xN
(7)
Just like the case of the variance, the bigger the value produced for each block the higher the
probability of being a crack block, since in this database there is not the problem of a low grey
32
level being something different than a crack. It can be just a random dark point but there is not
the risk of being confused by a shadow or oil stain. The odd moment order statistics have the
advantage of producing a negative value as output in the case of the pixel intensity is lower than
the block mean, which is not possible for the even moment order. In fact the more negative the
feature is, the more likely of being a crack block. The advantage of the odd moments over the
even moments is the fact that crack pixels are, in their majority, below the block mean value
while the non-crack pixels, in their majority, much closer to the mean, separating better the two
pattern classes.
Mip, mean and variance (standard deviation) are features already seen in the literature that
provided good results (see section 2.3.2). The higher order moments defined also in this section
do not often appear in the literature. They come in the sequence of the frequently use of the first
and second order, being merely experimental features that proved to be helpful in the
developed classifier as shown in the following Chapter. Actually the third order moment is used
in [4] for pothole detection.
3.4.2 Statistical properties of the selected features
This section briefly characterizes the statistical properties of the proposed features. All the
features extracted from the database can be represented in a vector, usually called feature
vector. The feature vector and block label, respectively and , are random variables
characterized by a joint probability distribution.
( ) ( ), ( | ) ( )i i i i ip x y p x y p y (8)
Where is the probability density function of the feature vector conditioned on the
block label assigned and is the probability of each class (crack or non-crack). Since
is difficult to estimate and to visualize, as it depends on the considered set of
features (six features were listed in the previous subsection), the marginal distribution of each
feature
will be considered, by computing the histogram associated to both classes.
Figure 17 shows the normalized histograms of the six considered features for each of the two
admissible labels and for the first database (Road1), without applying any pre-processing
technique. This way the response of each feature to the original data is observed, giving a hint
of the most promising features capable of separating the two classes efficiently. However the
reader should bear in mind that after applying pre-processing the order of the most promising
features may be altered.
33
(a) (b)
(c) (d)
(e) (f)
Figure 17 – Histograms of the six features of crack (red) and non-crack blocks (green): a)
minimum value; b) mean intensity; c) variance; central moments of order d) 3, e) 4 and f) 5. The
vertical axis corresponds to the probability of each pattern to occur. The horizontal axis
corresponds to the feature value. Note that these histograms are normalized.
The information supplied by these graphics shows visually how good each feature is in
separating the two classes. However, it is hard to quantify it, just by observing the histograms.
To determine the ability of each feature to predict the corresponding label , the mutual
information (MI) and the correlation coefficient defined in [39], were computed. The
corresponding results are shown in Table 1.
1
1 0
( , ), y ( , ) log
( ) ( )j
Mj
j j
x y j
p x yMI x p x y
p x p y
(9)
M
1
( , ), y
* ( )j
j
j
xj
cov x yR x
var x var y
(10)
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
-10 -8 -6 -4 -2 0 2 4 6 8 100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
34
The first equation concerns the mutual information and the second the correlation coefficient.
Since these two measures work with discrete values, each histogram is divided into 100 bins.
This has the consequence that every point with a distance of 0.2 in the horizontal axis is
collected. The crack label of the minimum value histogram (red line) is the most affected by this
discretization measure, in terms of the number of points, since it only has values higher than
zero between -3 and 0, i.e. the feature values, leading to 15 points collected. The most affected
non-crack label (green line) corresponds to the fourth and fifth order, having also 15 points
collected since it has, substantially, a gap of 3 where the non-crack blocks are higher than zero.
In these 3 graphs the slope of the histogram for the respective label is much higher than the
other slope histogram label. Still the number of bins defined was high enough to characterize
well the slope as observed in Figure 17.
The values of the correlation coefficients can vary between -1 and 1. The bigger the absolute
value the more correlated the two variables are ( and ) the better the feature. The mutual
information value can vary between - (if =0) and + (if =0 or =0). The bigger
the mutual information value the more information can be extracted from the feature.
Table 1 – Mutual information and correlation coefficient of the proposed features for the first
database.
Features MI R
0.120 -0.49
0.006 -0.07
0.089 0.52
0.134 0.63
0.112 0.69
0.146 -0.70
Despite mutual information and correlation coefficients being two quality measures to rank
features, their output can differ as observed in Table 1. However both measures classified
feature number six (fifth moment) as the best feature for the original data. Mean and variance
were the worst ranked features in both measures (in absolute terms the variance value in the
correlation coefficient is higher but similar to the first feature mip value while in the mutual
information the mip value is much higher. Therefore the variance was considered the second
worst feature). For the four best ranked features (third, fourth and fifth order moment and mip),
six scatter diagrams, each one representing one different feature pair, are made and shown in
Figure 18. The features in these scatter diagrams are normalized.
35
(a) (b)
(c) (d)
(e) (f)
Figure 18 – Scatter diagrams of six feature pairs for the first database a) mip and third order, b)
mip and fourth order, c) mip and fifth order, d) third and fourth order, e) third and fifth order, f)
fourth and fifth order. The green corresponds to non-crack blocks while the red represents the
crack blocks. The horizontal axis corresponds to one of the features used (the first in each
paragraph) and the vertical axis corresponds the other (the second).
As observed the features that have the higher values in the mutual information and correlation
coefficient do not correspond automatically to the best pair, in fact, the scatter diagrams
containing two order moments proved to be not as good as expected. The reason resides in the
information contained in those features being correlated, i.e., two random features can have
more information together than the combined information of the two most promising features. It
is questionable to determine the best feature pair that can separate effectively the two pattern
classes just by looking at these graphics. However, some feature pairs (e.g. mip and third order)
36
seem to divide the two classes properly, leaving a small region where the two classes are mixed
than other feature pairs (e.g. mip and fourth order) where the mixed region is more intense.
It is possible to use more than two features at same time. In fact, all features can be used at the
same time, leading to a higher dimensional feature space, whose histogram would be hard to
visualize. The purpose of these histograms was to gain some intuition about how good are the
features for the data.
Note that Figure 18 illustrates all the blocks of all Road1 images. When dealing with a
supervised classifier a training set and a testing set is required. So part of these data will
compose the training set and the remaining data the testing set.
3.5 Classifier
The current section starts with the training stage procedure along with important details,
capable of improving the system overall performance. Moreover a brief explanation about the
testing phase functioning is also present. Finally a description of the strategy applied to select
the training set and the testing set is presented, followed by the SVM parameters selected to
train the system.
Since SVM is a supervised learning algorithm, the first stage of the classifier is a training one.
After choosing and extracting the features from the training set and supplying the respective
ground-truth, it is necessary to select also the kernel type and the kernel parameters (see
section 2.4.2.4 for more detail). Then the classifier computes the decision boundary that
separates best the two classes, being used later to classify the testing set samples.
An important factor to establish in the training stage is the balance of the crack and non-crack
samples. If the two different classes have a big difference in terms of number of samples used
for training purposes, the classifier may not achieve optimum performance, rather giving higher
priority to correctly classifying the class that has more samples (non-crack). Several strategies
can be considered to handle this problem, also known as imbalanced data [40]. For instance the
crack samples can be repeated until achieving the same number as the non-crack samples.
Other methods could be the selection of a random portion of the non-crack features (under-
sampling) or the creation of new data for the smaller class (over-sampling). One issue
concerning this last technique is the fact of being data depended, which may not apply for every
case. Another measure can be the adjustment of the regularization parameter C. Instead of
replicating the crack features to match the number of the non-crack features, leading to a higher
computation, the C value can be adjusted for each of the two different classes, in a way that the
37
sum of all crack samples weights would be the same as all the non-crack samples weights. In
this dissertation the strategy selected was to repeat the existing crack samples, until achieving
the same number as the non-crack samples.
Another measure that helps improving the system performance is the normalization of all the
features
by subtracting the mean and dividing by the standard deviation i.e.
( )
( )
i
j ji
j
j
xx
(11)
Where stands for the block and is the feature number. The mean and standard deviation are
computed in the feature vector of the training set.
As soon as the system is trained and the SVM model is computed, the testing phase takes
place. The purpose of this stage is to test the system accuracy on detecting cracks as well as
the quality of the crack features that were extracted. To achieve reliable and accurate results,
the testing and training sets should be disjoint.
Just like the training set, the images of the testing set will be pre-processed in the same way
and divided in blocks of 75x75 pixels. After that, the features from each block of the testing set
are extracted and normalized. Then the model created in the training stage is applied to all the
features extracted from the testing set. In this step the testing set features are mapped to the
same dimension of the training features and the hyperplane computed previously to separate
the two classes is used. Depending on the side of the hyperplane the testing features fall on,
the corresponding pattern class is attributed.
The training and testing sets were selected through a cross-validation technique, which supplies
a good way to compute an approximation of the classifier accuracy on average. Cross-
validation is a technique that divides equally the database images into several folds. Typically
the training set is composed of all folds but one and the testing set is composed by the
remaining fold. Then, several different classifiers are trained and tested (as many as the
number of folds), where each classifier is composed of a different testing set from each other. In
the end the system accuracy is computed based on the average value of the several classifiers
accuracy.
The training and testing stages of the SVM classifier were performed using two functions from a
library for SVM called LIBSVM. Several parameters were selected from this library. For instance
the kernel type selected to separate the two pattern classes was RBF since it is the kernel that
38
typically supplies better results [38]. The kernel parameter ϒ was the inverse of the number of
features (default value) while the weigh chosen for the regularization parameter C was one
(default value) for each class since the number of crack samples and non-crack samples were
the same.
Once all of the testing samples are assigned a comparison between the output produced by the
classifier and the ground-truth will dictate the system accuracy. The comparison consists on
evaluating each block of the testing set labeled by the classifier with the respective ground-truth.
Then, based on the correct or incorrect decision of the classifier for each block, the system
performance is computed as described in Section 4.3. Actually, the comparison between the
system output and the ground-truth is only made after applying the post-processing. The next
section describes the post-processing used and the contributions it supplied to improve the
system performance.
Note that the use of the ground-truth is always necessary for training the system to know the
features that correspond to crack or non-crack blocks. Without this information it would not be
possible to compute the decision boundary and separate the two classes. However, the testing
set does not require any ground-truth since it will apply the model created in the previous stage
and label each testing block based on it. The ground-truth is only used in this stage to be
compared with the system output to measure the system performance.
Note also that the system is only trained once for each feature set to create the respective
model but it can be tested for several different testing sets. So even if the training stage is
computational heavy due to its large number of samples, this effort is only made once.
3.6 Post-processing
After classifying all the testing set blocks with one label (0 or 1), a very simple and intuitive post
processing such as cleaning the isolated blocks classified as containing cracks is applied. Due
to the block size chosen (75x75 pixels) and the elongated characteristic that cracks normally
have, it would be strange that an isolated block contain any crack feature. It is more prone to be
noise and therefore a misclassified block. This procedure proved to be very effective, erasing
misclassified blocks, therefore improving the system performance like shown in Figure 19.
However one image of the database is accidentally damaged because of this operation, being
shown in Figure 20. The top image corresponds to the ground-truth, being the middle image the
classification output and the bottom image the output after pos-processing. As observed, post-
processing cleans most of the blocks that were incorrectly classified as containing crack pixels,
39
but it also erases two blocks classified as cracks in the ground-truth information due to the fact
they were isolated.
Figure 19 – Example that shows the improvement of the post-processing technique. The top
image corresponds to the ground-truth, the middle image to the classifier output and the bottom
image to the classifier output after post-processing.
40
Figure 20 – Example that eliminates crack blocks using a post-processing technique, leading to
worse results. The top image corresponds to the ground-truth, the middle image to the classifier
output and the bottom image to the classifier output after post-processing.
Some images, at first sight, present less positive results, like illustrated in Figure 21, since
several blocks were wrongly detected as cracks. However it can be seen that, for a few blocks,
the classifier actually discovered some crack evidences despite not being qualified as such in
the ground-truth labeling. This issue concerns the boundary of the crack definition for the
specialist that labels each block which can differ from other specialists. Despite the ground-truth
sometimes is subjective, the blocks classified by the system were always compared with the
ground-truth data. Still, the post-processing helped to clean some of the block classified as non-
crack by the ground-truth, improving the classifier performance.
41
Figure 21 – Example of a doubtful case where the classifier detect blocks that have crack
evidence but not classified as such by the ground-truth. The top image corresponds to the
ground-truth, the middle image to the classifier output and the bottom image to the classifier
output after post-processing.
Figure 22 shows two examples of non-crack images. This kind of images can have a strong
impact in the overall results, since there are only two possible output classifications: 100% right
or 100% wrong. It only takes one block falsely classified as crack to significantly affect the
overall classification results, unlike what happens for images containing cracks, which can have
several blocks wrongly classified as cracks but still producing satisfactory results. After post-
processing these images, the system managed to get an 100% correct classification, improving
significantly the system overall performance.
42
Figure 22 – Non-crack images examples that were rightly classified after pos-processing. The
right column images are the ground-truth (in this case it coincides with the original image) and
the left column images the classifier output. After post-processing the block classified as crack
in each image is eliminated.
43
4 System evaluation
The current Chapter is structured as follows. The test conditions of the several simulations
made in each of the two databases are described in the first two sections. The third section
presents the evaluation metrics applied in the developed system and the last section the results
achieved for each of the two databases.
4.1 Test conditions - Road 1
Several experiments were applied for the first database (Road1), involving several different
feature sets.
The first simulations are made without any pre-processing technique, being then repeated with
several pre-processing configurations (see section 3.3) to compare the different results (without
pre-processing and with pre-processing) and evaluate the most promising feature sets and pre-
processing techniques.
A cross validation approach was applied in each test, leaving always one image out to be part
of the testing set. This lead to 56 different classifiers trained in each experiment.
4.2 Test conditions - Road2
All the simulations made in the first database were repeated for the second one with the same
pre-processing configurations and parameters to further establish a comparison between the
two databases results.
The leave one image out strategy was not applied for the second database. Since this database
has more images (165 images) than the first database, it might be exaggerating putting all the
images but one in the training set. In addition, the second database has a higher resolution than
the first one, leading to a higher number of features to analyze per image and taking
considerably more time to train a model. A simulation test to find a smaller training set that still
supplies similar results as the leave one image out technique, for the second database, is
presented in Figure 23. This simulation test includes the average value of 10 different
simulations for each training size (1 to 164 images) randomly selected.
The horizontal axis represents the number of images of the training set while the vertical axis
the statistical measure to evaluate the classifier output (in this case the evaluation metric recall
44
was selected for the first graph and f-measure for the second graph). This simulation test used
mip and third order moments as features without using pre-processing.
Figure 23 – Performance for different training set sizes (training images are randomly selected).
The horizontal axis represents the number of images of the training set. The vertical axis
represents the evaluation metric (recall for the first graph and f-measure for the second graph).
Figure 23 shows a slightly improvement of the system performance with the increasing of the
training data for both graphs. Despite the results obtained using the f-measure as an evaluation
metric are very low, the results achieved using recall are much better. This reveals that most of
the blocks containing cracks were classified as such but also many non-crack blocks. The fact
that this database has more difficult cracks to detect than the first database and has nearly 50%
of non-crack images helps to clarify the poor results achieved by the f-measure. Since both
graphics demonstrate the highest values between the training set size 140 and 160 a ten-fold
cross validation is applied for the Road2 simulations. This way the model created in the training
stage still has enough information (training samples) to describe well the cracks with less
computational effort. However, the models computed for Road2, following the ten-fold cross
45
validation, will differentiate more from each other, providing a more robust test to the feature
sets and determining also with more precision how good these features to describe cracks are.
The same does not happen with the first database since only one image is exchange in each
iteration of the cross validation (leave one image out), being the models much more similar
between each other than the models computed in the second database.
4.3 Performance measures
In this subsection the statistical measure used to classify the system output are first presented.
Then the following subsection illustrates the results achieved in each simulation of the two
databases (Road1 and Road2) in Table 2 and Table 3.
Several evaluation metrics were computed to quantitatively classify the results, namely
confusion matrix, precision, recall and f-measure (combination of the precision and recall
measures). Confusion matrix is composed of four metrics, true positive, false positive, true
negative and false negative. True positive corresponds to the number of crack blocks that were
well classified while false positive to the number of crack blocks that were misclassified, i.e.
blocks that were classified as crack while the ground-truth labeled as non-crack. False negative
corresponds to the number of non-crack blocks that were misclassified and true negative to the
number of non-crack blocks that were well classified.
Precision is a measure that dictates the number of cracks that were well classified (true positive)
over the total number of cracks detected by the classifier (true positive + false positive). Recall
measures the number of cracks that were well classified (true positive) over the total number of
regions classified by the ground-truth as cracks (true positive + false negative). So precision can
be seen as a quality measure since it measures the quality results of the classifier while recall is
more similar to a quantity measure since it computes how many regions were classified
correctly, comparing with the ground-truth. F-measure is a combination of these two metrics and
it is described by the following expression:
2* *
_precision recall
precision recallf measure
(12)
46
4.4 Experimental results
This section presents the crack detection results, without pre-processing as well as with several
pre-processing configurations, measured by the recall, precision and f-measure, for the two
databases considered. These results are shown in Table 2 and Table 3.
Table 2 – Precision (top), recall (middle) and f-measure (bottom) of Road1
features without pre-processing top-hat
mean& top-hat min & ahe
Mip3 83.96% 81.17% 53.86% 27.16%
Mip4 66.46% 83.43% 50.83% 53.36%
Mip5 83.92% 84.18% 44.28% 26.98%
MV 62.15% 77.06% 44.93% 59.77%
MV3 88.11% 88.11% 88.9% 71.1%
MipMV 62.57% 71.21% 63.96% 51.55%
MipMV3 87.16% 88.54% 88.99% 69.72%
MipMV34 88.52% 88.28% 89.4% 69.53%
MipMV345 88.27% 87.81% 89.1% 70.26%
features without pre-processing top-hat
mean& top-hat min & ahe
Mip3 97.62% 98.06% 94.23% 96.95%
Mip4 97.44% 97.7% 90.5% 94.04%
Mip5 97.71% 98.11% 91.1% 96.72%
MV 77.57% 94.18% 91.49% 86.09%
MV3 98.27% 98.68% 98.04% 95.71%
MipMV 97.39% 96.77% 94.62% 93.9%
MipMV3 98.62% 98.67% 98.23% 97.16%
MipMV34 98.85% 98.49% 98.11% 97.21%
MipMV345 98.85% 98.49% 98.18% 97.02%
features without pre-processing top-hat
mean& top-hat min & ahe
Mip3 89,8% 87,66% 67,93% 41,99%
Mip4 77,01% 88,91% 64,41% 62,95%
Mip5 89,91% 89,79% 59,24% 41,71%
MV 57,99% 81,81% 59,28% 63%
MV3 92,56% 92,73% 92,84% 79,6%
MipMV 76,63% 80,05% 74,91% 62,25%
MipMV3 92,22% 92,93% 92,88% 79,4%
MipMV34 93,07% 92,67% 93,09% 79,25%
MipMV345 92,94% 92,41% 92,94% 79,77%
47
Table 3 – Precision (top), recall (middle) and f-measure (bottom) of Road2.
features without pre-processing top-hat
mean & top-hat min & ahe
Mip3 11.95% 14.87% 16.75% 5.72%
Mip4 13.7% 15.93% 19.33% 6.76%
Mip5 14.13% 15.35% 17.15% 6.06%
MV 8.44% 16.23% 21.87% 7.35%
MV3 15.44% 19.8% 21.21% 15.57%
MipMV 13.64% 17.25% 17.54% 10.09%
MipMV3 15.87% 22.52% 20.19% 15.21%
MipMV34 19.11% 26.12% 20.12% 16.93%
MipMV345 24.04% 26.99% 20.19% 16.87%
features without pre-processing top-hat
mean & top-hat min & ahe
Mip3 91.64% 93.36% 93.85% 78.8%
Mip4 91.5% 91.92% 90.83% 78.32%
Mip5 92.67% 94.19% 93.93% 79.35%
MV 66% 81.98% 82.09% 76.64%
MV3 84.04% 93.02% 92.43% 84.3%
MipMV 90.2% 90.55% 90.86% 81.42%
MipMV3 91.32% 94% 92.53% 85.43%
MipMV34 93.66% 94.29% 92.66% 88.16%
MipMV345 92.59% 94.29% 92.5% 88.36%
features without pre-processing top-hat
mean & top-hat min & ahe
Mip3 20,30% 24,7% 27,5% 10,2%
Mip4 22,83% 25,62% 30,1% 11,78%
Mip5 23,42% 25,46% 28,04% 10,75%
MV 13,9% 25,13% 31,88% 12,67%
MV3 24,52% 31,18% 33,23% 24,22%
MipMV 22,78% 34,84% 28,34% 16,7%
MipMV3 25,85% 39,24% 31,96% 23,8%
MipMV34 30,37% 40,37% 31,87% 26,44%
MipMV345 36,56% 27,82% 31,98% 26,43%
Since it is not possible to know in advance, which feature combinations are the best, several
feature sets were defined and tested. It is possible to merge all features into a single feature
vector and use it for classification. However, this strategy may not lead to the best results.
Increasing the number of features usually increases the amount of information extracted from
the image, but also increases the feature space dimension. Training a classifier in a high
dimension feature space is more difficult since it requires a very large data set, making also
48
supervised training much harder. This effort can be unnecessary since it can lead to worst
results. This problem is known as the curse of dimensionality [39].
As a consequence 9 different feature sets were tested. Table 2 and Table 3 present the results
achieved by the several feature set configurations admitted for the two databases (Road1 and
Road2). “Mip” corresponds to the mip feature, “M” and “V” to the mean and variance feature,
respectively, and each number to the respective order moment (3- third order moment, 4- fourth
order moment and 5-fifth order moment). Section 3.4.1 presents and describes these six
features.
Each column of each table represents a different pre-processing configuration. In total 4
configuration were used, namely no pre-processing stage, top-hat, mean filter followed by top-
hat and finally minimum filter followed by adaptive histogram equalization. Section 3.3 presents
and discusses these configurations such as their parameters.
In Table 2 several different feature sets achieve a f-measure higher than 90% without using any
pre-processing. Some other feature sets are not so good but still presenting a high f-measure
value while others presented a low value (e.g. the MV feature, as expected by the analysis
made in section 3.4.2). Comparing the results achieved without using any pre-processing with
the pre-processing ones, the top-hat configuration results were similar with exception of Mip4
and MV that improved considerably, while the configuration mean & top-hat present much lower
results for some feature sets and slightly better for others. The last pre-processing configuration
proved to be a bad choice, showing much worse results.
A significant difference between the evaluation metrics precision and recall (top and middle
table of Table 2 respectively) can be observed. The precision average value for each feature set
is always lower than 90% while most of the recall average value of each feature set is higher
than 90%. Since the evaluation metric f-measure is highly influenced by the lowest value, the
precision metric is the main responsible for not obtaining better results. In addition it is stated in
[11] that recall is more important than precision for crack detection, i.e. to detect the crack
blocks correctly has more meaning than label non-crack blocks as cracks. Moreover the number
of crack blocks are much less than the number of non-crack blocks, being very difficult to obtain
a high precision.
Comparing the recall and f-measure results for all the configuration sets it can be observed that
the recall values are always much higher than the f-measure, being the best recall 98.85% and
the best f-measure 93.09%.
49
In general, the pre-processing configuration tested does not improve significantly the system
performance. Considering the mip feature, it is reasonable to assume that the first two
configurations (top-hat filter and mean & top-hat) have very little impact on the minimum pixel
value of the block, since the mean filter is very small (3x3). But both configurations will affect the
several order moments features. The idea was to gather more the features from each pattern
class, increasing the distance between classes. For the top-hat configuration, the changes were
few since it only removes the noise produced by white pixels, and most of the feature sets
remain with a similar f-measure or a little improvement in comparison with the non pre-
processing configuration results. However, the top-hat configuration helped significantly the
feature pair MV, since it altered the mean and variance of each crack block to values closer to
what expected from a crack block (low mean and high variance). The non-crack blocks have
also their mean diminished, but their variance remained lower than the crack blocks. This
implies a better separation of the pattern classes.
The mean & top-hat configuration should improve the results obtained with the top-hat
configuration, since it should get a more uniform image with a higher contrast between the
cracks and the background, therefore creating a higher union for the same pattern class and a
higher separation for the different pattern classes. However the opposite effect is observed.
Comparing both results, the mean & top-hat recall values were always lower, while only four
feature sets could achieve a higher precision value. This means that a shorter distance between
classes occurred. Observing the similarity of the mean histograms of the crack blocks and non-
crack blocks (Figure 15) it is not so strange that the effect of the mean filter causes an
approximation between the two pattern classes, rather than increasing their distance, leading to
worse results.
The last configuration presents much worse results than the others. Despite reinforcing the
crack features and therefore enhancing the contrast between the crack and the background, it
also does the same thing for other artifacts that can be misclassified as crack by the classifier
later (note that the recall values are a little lower than the recall values of the other
configurations, but the same does not happen with the precision values).
To compare the results in both databases and to be faithful to the methodology followed for the
first database, the same simulations with the same parameters were repeated for the second
database. The f-measures obtained in Table 3, using the pre-processing configuration top-hat
and mean & top-hat were better than the results achieved without using any pre-processing,
while the min & ahe configuration proved to be worse.
In quantity terms the results achieve in this database are very low. The facts that the cracks are
much tougher than the first database and a ten-fold cross validation technique instead of a
50
leaving one-image out technique was used, which can worsen the model created in the training
stage can contribute to these results. In addition, the fact that a lot of non-crack images exist in
Road2 (about 50%) can help dropping significantly the system accuracy since any block labeled
as crack in these images only jeopardizes the system performance.
The testing set evaluation was done considering all the images and not each image individually,
i.e. all the four metrics of the confusion matrix were computed for the entire testing set as if it
were only one image instead of an average value of each metric extracted from each image.
That way classifying a block as crack in a non-crack image does not damage much the total
true positive metric but it will still degrade the f-measure metric.
Once again the precision metric is the main responsible for these results. The fact that these
images have a bigger resolution and several blocks are considered crack while they are not is
the main issue. In addition the feature sets selected may work well for the first database (since
they were specifically adjusted for Road1) but may not be the best for the second database.
This fact suggests, as discussed previously, that texture noise and the pavement road type can
be a significant influence in the classifier final results.
Other relevant observation is the fact that different pre-processing strategies should be used for
different feature sets, i.e. it may not make sense to use the same pre-processing for all the
feature configurations. For instance, Oliveira and Correia used in [6] a normalization pre-
processing followed by a top-hat filter, applying mean and standard deviation as features. In this
dissertation these two features as well as the top-hat filter are used, showing a significant
improvement compared with the original data results (without pre-processing) for both
databases. Therefore, a more careful and dedicated pre-processing selection for each feature
set could allow obtaining better crack detection results. Table 4 presents the results of isolated
features without pre-processing while Table 5 presents the best results achieved using several
distinctive pre-processing for each feature. Then, the best recall and f-measure of several
different combinations of features and pre-processing, forming a joint combination, are
presented in Table 6, where each feature within each cell was pre-processed by the technique
on the same line. The results achieved in these three tables concerns the first database.
Table 4 – Results of isolated features with and without pre-processing for the first database.
features recall precision f-measure
mip 97.26% 67.49% 77.88%
mean 76.18% 7.84% 13.52%
variance 80.34% 46.67% 44.86%
order3 88.28% 89.17% 86.39%
order4 85.38% 59.58% 60.33%
order5 92.48% 95.86% 93.91%
51
Table 5 – Results of isolated features with pre-processing for the first database.
pre-processing features recall precision f-measure
AHE mean 69.28% 14.07% 22.88%
top-hat variance 90.04% 78.99% 80.90%
top-hat order3 97.88% 82.86% 88.68%
mean order3 92.32% 94% 92.57%
top-hat order4 97.79% 83.54% 89.02%
top-hat order5 98.11% 83.77% 89.54%
mean order5 95.30% 94.09% 94.53%
Table 6 – Best joint recall (top) and best joint f-measure (bottom) achieved for the first
database.
pre-processing features recall precison f-measure
none order5
99.04% 87.59% 92.64% top-hat order3
top-hat order5
mean order5
none order5
99.04% 88.85% 93.31%
top-hat variance
top-hat order3
top-hat order5
mean order5
none mip
98.98% 89.12% 93.48%
top-hat variance
mean order3
top-hat order4
top-hat order5
mean order5
pre-processing features recall precison f-measure
top-hat variance
97.86% 91.32% 94.18% AHE mean
mean order5
none mip
98.71% 90.20% 93.97%
AHE mean
top-hat variance
mean order3
top-hat order4
none order5
top-hat order5
none mip
98.71% 90.16% 93.94%
AHE mean
top-hat variance
mean order3
top-hat order4
top-hat order5
52
A significant improvement is observed when comparing Table 4 with Table 5, i.e. when a
specific pre-processing is applied to a specific feature, the system performance improves in
comparison to the original data results (no pre-processing). Furthermore, Table 6 presents in
the first table the best recall metrics, being not only superior to the recall measures presented in
Table 4 and Table 5, but also to the best recall results previously achieved for the first database
(98.85%). However, this has the implication of achieving a lower f-measure when comparing to
some of the feature sets of both tables. Even the best f-measure metrics, presented in the
second table of Table 6 ,that are higher than the best f-measure achieved in the first database
(93.09%), were lower than the f-measure achieved by the feature order5 pre-processed with a
mean filter in Table 5. However, this fact as already stated in this dissertation, has little impact
on crack detection since the recall measure is more important than precision, due to its
capability of detecting more accurately the cracks even if, sometimes, misclassifies more blocks
as crack.
Therefore, by adjusting the right pre-processing to the right feature, the system performance
can be improved obtaining better results than the previous ones, since the recall achieved is
higher than the original data results or some generic pre-processing applied blindly to all feature
sets.
53
5 Conclusions and Future Work
This dissertation proposes a system for the detection of cracks in road images based on the
SVM classifier. Since SVM is a learning technique, the database is divided into two sets, one for
training the system and another for testing the system. Two different databases (Road1 and
Road2) were used to evaluate the system.
It is not always possible to compare the results obtained with other papers since some of them
do not present quantitative results using the standard evaluation metrics. Often, only some
qualitative results or evaluation metrics are provided, making it harder to compare with the
results produced in this dissertation. For instance a distinction between crack images and non-
crack images is an evaluation metric used in [17] and [12]. In [26] two supervised techniques
are used to classify the crack type, making no sense to compare the results. Other papers like
[20] and [21] only state qualitative results which makes very difficult to establish a comparison.
However, other papers provide statistical results about crack detection, being possible to
compare the results achieved. In [10] the best f-measure achieved was 93.8% (with a recall of
96.3% and a precision of 86.9%). A recall of 93.96% and a precision of 90.70%, producing a f-
measure of 91.95% is stated in [11] as the best result. However, the same paper achieves a
lower f-measure but with a recall of 96.75%, emphasizing more the comparison between recall
than the f-measure metric. A supervised parametric algorithm obtained the best result in [6] with
a f-measure of 94.7% and a recall of 97% (also the highest obtained in the paper). In [5] the
best recall was 95.44% for the first database and 85.44% for the second database using the
minimum intensity pixel (mip) as feature in both cases.
The best recall achieved in Road1 and Road2 was 98.85% and 94.24%, respectively. The
Road1 recall clearly outperforms the recall obtained in the literature, but the same does not
happen for the second database despite presenting a good result. In terms of the evaluation
metric f-measure the Road1 results can compete with the ones achieved in literature while the
Road2 results are far from it due to a very low precision value. Therefore the results achieved in
both databases can compete with the ones presented in the literature, in terms of recall, while
only the Road1 f-measure results can match with the respective literature results.
The best joint results achieved for the first database, produce a recall of 99.04%, a precision of
95.86% and a f-measure of 94.53%, surpassing the previous Road1 results.
The comparison between the literature results and the best system results is shown in Table 7.
54
Table 7 – Comparison of the literature results with the results achieved in the developed
system.
Evaluation
metrics
Literature results (%) Best system results (%)
[10] [11] [11] [6] [5] [5] Road1 Road2 Joint
results
Recall 96.30 93.96 96.75 97 95.44 85.44 98.85 94.24 99.04
Precision 86.9 90.70 - - - - 89.40 26.99 95.86
f-measure 93.8 91.95 - 94.7 - - 93.09 40.37 94.53
As future work, different batteries of tests, with different configurations of specific pre-
processing for each feature are proposed here, as valid way to improve the system
performance for larger and more challenging databases, such as Road2. New and Interesting
pre-processing techniques to use, expected to provide good results, could be the median filter
or morphological filters [14] to enhance cracks, and anisotropic diffusion filtering [10] to smooth
the image texture variation.
Multi-scale features could also be tested (e.g. wavelets coefficients or Gabor filters), since
cracks have different thickness. These features are typically used in the pixel based approach,
not normally applied in the block based, therefore being unknown their results in this approach.
A selection of a SVM parameter set that improves best the system after tested with several
SVM parameters by trial and error [38] is also proposed as future work. An important parameter
here is the regularization parameter C, since it can save computational time and improve the
system performance. For instance the right adjustment of the regularization parameter C for the
two different classes is computational lower than replicating the number of crack features until
reach the number of non-cracks (the approach used in this dissertation). Other imbalanced data
measure that can bring benefit for the system performance and reduce the computational effort
is the random selection of non-crack features (under-sampling) such that it matches the number
of crack features, being the parameter C equal for both pattern classes.
55
6 References
[1] M. Gavilán, D. Balcones, O. Marcos, D. F. Llorca, M. A. Sotelo, I. Parra, M. Ocaña, P.
Aliseda, P. Yarza and A. Amírola, "Adaptive Road Crack Detection System by Pavement
Classification," Sensors, vol. 11, no. 10, pp. 9628-9657, 2011.
[2] G. Moussa and H. Hussain, "A New Technique for Automatic Detection and Parameters
Estimation of Pavement Crack," in Proceedings of the 4th International Multi-Conference on
Engineering and Technological Innovation, Orlando, Florida, USA, 2011.
[3] JAE, Catálogo de Degradações dos Pavimentos Rodoviários Flexíveis, 1997.
[4] J. Lin and Y. Liu, "Potholes Detection Based on SVM in the Pavement Distress Image," in
Proceedings of the 9th International Symposium on Distributed Computing and Applications
to Business, Engineering and Science, Hong Kong, China, 2010.
[5] P. Rosa and P. Correia, "Automatic Road Pavement Crack Detection Using Boosting
Classifiers," in Proceedings of the European Signal Processing Conference - EUSIPCO,
Glasgow, United Kingdom, 2009.
[6] H. Oliveira and P. L. Correia, "Supervised Crack Detection and Classification in Images of
Road Pavement Flexible Surfaces," in Recent Advances in Signal Processing, Austria, In-
Tech, 2009.
[7] L. Jing and Z. Aiqin, "Pavement crack distress detection based on image analysis," in
Proceedings of the 2010 International Conference on Machine Vision and Human-machine
Interface, Kaifeng, China, 2010.
[8] T. Nguyen, M. Avila and S. Begot, "Automatic detection and classification of defect on road
pavement using anisotropy measure," in Proceedings of the 17th European Signal
Processing Conference, Glasgow, Scotland, 2009.
[9] H. Oliveira and P. L. Correia, "Automatic Road Crack Segmentation using entropy and
image dynamic thresholding," in Proceedings of the European Signal Processing
Conference - EUSIPCO, Glasgow, United Kingdom, 2009.
[10] H. Oliveira and P. L. Correia, "Automatic Crack Detection on Road Imagery using
Anisotropic Diffusion and Region Linkage," in Proceedings of the European Signal
Processing Conference - EUSIPCO, Aalborg, Denmark, 2010.
[11] H. Oliveira and P. L. Correia, "Evaluation of Pre-processing in Road Pavement Image
Analysis," in Proceedings of the Conference on Telecommunications - ConfTele, Santa
Maria da Feira, Portugal, 2009.
[12] N. Sy, M. Avila, S. Begot and J. Bardet, "Detection of Defects in Road Surface by a Vision
System," in Proceedings of the 14th IEEE Mediterranean Electrotechnical Conference,
Ajaccio, France, 2008.
56
[13] S. Chambon and Jean-MarcMoliard, "Automatic Road Pavement Assessment with Image
Processing: Review and Comparison," International Journal of Geophysics, vol. 2011, p.
20, 2011.
[14] G. Bao, "Road Distress Analysis using 2D and 3D Information," to obtain the master degree
of science in Electrical Engineering, University of Toledo, 2010.
[15] B. Augereau, B. Tremblais, M. Khoudeir and V. Legeay, "A differential approach for fissures
detection on road surface images," in Proceedings of the 5th International Conference on
Quality Control by Artificial Vision, Le Creusot, France , 2001.
[16] Q. Zou, Y. Cao, Q. Li, Q. Mao and S. Wang, "CrackTree: Automatic crack detection from
pavement images," Pattern Recognition Letters, vol. 33, no. 3, pp. 227-238 , 2011.
[17] N. Tanaka and K. Uematsu, "A crack detection method in road surface images using
morphology," in Proceedings of the Workshop on Machine Vision Applications, Makuhari,
Chiba, Japan, 1998.
[18] H. Cheng, J. Wang, Y. Hu, C. Glazier, X. Shi and X. Chen, "Novel approach to pavement
cracking detection based on neural network," Transportation Research Record, vol. 1764,
pp. 119-127, 2001.
[19] A. Ayenu-Prah and N. Attoh-Okine, "Evaluating pavement cracks with bidimensional
empirical mode decomposition," EURASIP Journal on Advances in Signal Processing, vol.
2008, 2008.
[20] P. Subirats, O. Fabre, J. Dumoulin, V. Legeay and D. Barba, "A combined wavelet-based
image processing method for emergent crack detection on pavement surface images," in
Proceedings of the 12th European Signal Processing Conference EUSIPCO, Vienna,
Austria, 2004.
[21] P. Subirats, J.Dumoulin, V. Legeay and D. Barba, "Automation of pavement surface crack
detection with a matched filtering to define the mother wavelet function used," in
Proceedings of the 14th European Signal Processing Conference (EUSIPCO), Florence,
Italy, 2006.
[22] S. Zhibiao and G. Yanqing, "Algorithm on contourlet domain in detection of road cracks for
pavement images," in Proceedings of the 9th International Symposium on Distributed
Computing and Applications to Business, Engineering and Science, Hong Kong, 2010.
[23] C.-X. Ma, C.-X. Zhao and Y.-K. Hou, "Pavement distress detection based on
nonsubsampled contourlet transform," in Proceedings of the International Conference on
Computer Science and Software Engineering,, Wuhan, China, 2008.
[24] R. Medina, J. Gómez-García-Bermejo and E. Zalama, "Automated Visual Inspection of
Road Surface Cracks," in Proceedings of the 27th International Symposium on Automation
and Robotics in Construction, Bratislava, Slovakia, 2010.
[25] P. Subirats, J. Dumoulin, V. Legeay and D. Barba, "Automation of pavement surface crack
57
detection using the continuous wavelet transform," in Proceedings of the IEEE international
Conference on Image Processing, Atlanta, GA, 2006.
[26] N. Li, X. Hou, X. Yang and Y. Dong, "Automation Recognition of Pavement Surface
Distress Based on Support Vector Machine," in Proceedings of the 2009 Second
International Conference on Intelligent Networks and Intelligent Systems , Tianjian, China ,
2009.
[27] Y. Huang and B. Xu, "Automatic inspection of pavement cracking distress," Journal
Electronic Imaging, vol. 15, p. 013017, 2006.
[28] C. Jiang-wei, C. Xiu-min, W. Rong-ben and S. Suming, "Research on Asphalt Pavement
Surface Distress Image Feature Extraction Method," Journal of image and grahics, Vols.
8(A), No.10, pp. 1211-1217, 2003.
[29] B. J. Lee and H. D. Lee, "A robust position invariant artificial neural network for digital
pavement crack analysis," in Proceedings of the TRB Annual Meeting, Washington, DC,
USA, 2003.
[30] N. Otsu, "A Threshold Selection Method from Gray-Level Histograms," IEEE Transactions
on Systems, Man and Cybernetics, vol. 9, pp. 62--66, 1979.
[31] H. Elbehiery, A. Hefnawy and M. Elewa, "Surface Defects Detection for Ceramic Tiles
Using Image Processing and Morphological Techniques," in Proceedings of the World
Academy of Science, Engineering and Technology, 2005.
[32] J. Chou, W. O’iNeill and H. Cheng, "Pavement distress evaluation using fuzzy logic and
moments invariants," Transportation Research Record, pp. 39-46, 1995.
[33] H. Lin, J.-W. Zhao, Q.-S. Chen, J.-R. Cai and P. Zhou, "Eggshell Crack Detection Based on
Acoustic Impulse Response and Supervised Pattern Recognition," Czech Journal of Food
Sciences - UZEI, vol. 27, pp. 393-402, 2009.
[34] M. Nieniewski, L. Chmielewski, A. Józwik and M. Sklodowski, "Morphological Detection and
Feature based Classification of Cracked regions in Ferrites," Machine Graphics and Vision,
vol. 8, pp. 699-712, 1999.
[35] A. Ramdas, "Bootstrapping,Adaboosting, Uncertainty Sampling for Genre Classification of
Fine Art Paintings," 10 december 2011.
[36] H. Furuta and H. Hattori, "Damage Assessment of Reinforced Concrete Bridge Decks using
Adaboost," in 3rd International ASRANet Colloquium, Glasgow, UK, 2006.
[37] "Wikipedia," [Online]. Available: http://en.wikipedia.org/wiki/Support_vector_machine.
[Accessed 21 8 2012].
[38] C.-W. Hsu, C.-C. Chang and C.-J. Lin, "A Practical Guide to Support Vector Classification,"
15 4 2010. [Online]. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/. [Accessed 2012].
[39] I. Guyon and A. Elisseeff, "An Introduction to Variable and Feature Selection," The Journal
58
of Machine Learning Research, vol. 3, pp. 1157-1182 , 2003.
[40] B. X. Wang and N. Japkowicz, "Boosting Support Vector Machines for Imbalanced Data
Sets," in Proceedings of the 17th international conference on Foundations of intelligent
systems , Springer-Verlag Berlin, Heidelberg, 2008.
[41] D.-q. Zhang, S.-r. Qu, W.-b. Li and L. He, "Image Enhancement Algorithm on Ridgelet
Domain in Detection of Road Cracks," China Journal of Highway and Transport, vol. 22, no.
2, pp. 26-31, 2009.
[42] H. N. Koutsopoulos and A. B. Downey, " Primitive-based classification of pavement
cracking images," Journal of Transportation Engineering, Vols. 119, no. 3, p. 402–418,
1993.
[43] L. Gang, H. Yu-yao and Z. Yan, "Automatic Recognition Algorithm of Pavement Defect
Image Based on OTSU and Maximizing Mutual Information," Microelectronics & Computer,
Vols. 26, No .7, pp. 241-247, 2009.
[44] W. Xiao and ZhangXue, " A New Method for Distress Automation Recognition of Pavement
Surface Based on Density Factor and Image Processing," Journal of Transportation
Engineering and Information,, pp. 82-89, 2004.
[45] J. Zhou, P. Huang and F.-P. Chiang, "Wavelet-Based Pavement Distress Classification,"
Transportation Research Record: Journal of the Transportation Research Board , pp. 89-
98, 2005.
[46] N. Nishimura and S. Kobayashi, "A boundary integral equation for an inverse problem
related to crack detection," International Journal for NumericalMethods in Engineering,
Vols. 32, no. 7, p. 1371–1387, 1991.
[47] M. Kaseko, Z. Lo and S. G. Ritchie, "Comparison of Traditional and Neural Classifiers for
Pavement Crack Detection," Journal of Transportation Engineering, vol. 120, p. 552–569. ,
1994.
[48] F. M. Nejad and H. Zakeri, "An optimum feature extraction method based on Wavelet–
Radon Transform and Dynamic Neural Network for pavement distress classification,"
Expert Systems with Applications, vol. 38, no. 8, pp. 9442-9460 , 2011.
[49] J. Shirataki and T.Tomikawa, "A study of road crack detection by image processing,"
Research Reports of Kanagawa Institute of Technology, vol. 24, pp. 67-71, 2000.
[50] H. Lee and H. Oshima, "New crack-imaging procedure using spatial autocorrelation
function," Journal of Transportation Engineering, Vols. 120, no. 2, p. 206–228, 1994.
[51] E. Teomete, V. R. Amin, H. Ceylan and O. Smadi, "Digital image processing for pavement
distress analyses," in Proceedings of the Mid-Continent Transportation Research
Symposium, Ames, Iowa, 2005.
[52] M. N. Do and M. Vetterli, "Contourlets: A Directional Multi resolution Image
59
Representation," in Proceedings of the International Conference on Image Processing,
2002.