Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
FAILURE AND MAINTENANCE INFORMATION EXTRACTION
METHODOLOGY USING MULTIPLE DATABASES FROM INDUSTRY: A NEW
DATA FUSION APPROACH
Kazi Arif-Uz-Zaman Master in Engineering (Research)
Supervisors:
Principal: Professor Lin Ma
Associate: Dr. Michael E. Cholette, A/Prof.Yue Xu, Dr. Azharul Karim
Submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy
School of Chemistry, Physics and Mechanical Engineering
Faculty of Science and Engineering
Queensland University of Technology
2018
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach i
Keywords
Text mining, work order (WO) analysis, naïve Bayes, support vector machine,
failure time, text classification, active learning, information requirement
specifications, semi-supervised learning, reliability models, maintenance optimization
models.
ii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
Abstract
Maintenance planning, budgeting, and optimisation continue to attract
significant research and practical attention. At the centre of all of these methodologies
are statistical models for the reliability and/or degradation of key assets. Yet, these
statistical models require accurate event times (e.g. failure times) and for many
industrial applications, such information is often scattered in many historical
maintenance databases. Additionally, real world databases have often been set up for
purposes other than statistical modelling and are focused on the process of
maintenance activities (e.g. communicating what needs to be done by the maintenance
crew and when) rather than on detailed cataloguing of downtime causes, degradation,
and failure events. In addition, different aspects of maintenance activities themselves
are often dispersed across different databases. Some databases contain data
descriptions of the work that needs to be conducted and give indications of the priority
of maintenance activities, others may contain detailed information on when the asset
was operating and when it stopped without noting the reason. Thus, the existing data
cannot be interpreted individually, since each database provides an incomplete picture
of the asset performance, condition, and reliability.
This study aims to establish methods for linking relevant data and information
in separate maintenance databases to support reliability and maintenance decision
modelling, in particular, Time to Failure or Failure Time Information. First, the data
requirements for reliability and maintenance optimisation modelling are established
and possible sources of information to satisfy such requirements are investigated. To
link different databases, this thesis proposes an innovative text mining approach. To
obtain such linking, some organisations may use different data fields or may check the
synergy of dates. However, in many historical databases (especially for a long lived
asset), such linking does not exist. Though different databases provide their own sides
of the picture of maintenance, the most commonly available and detailed maintenance
information is often recorded in the free texts of maintenance work descriptions.
Therefore, one may expect that different maintenance databases can be linked by
mining the free text to identify and extract the information necessary for asset
reliability and optimisation modelling.
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach iii
A text mining approach is employed to extract Failure Time using keywords
(existed in free text descriptions of various databases) indicative of the nature and
characteristics of the maintenance events. This study automatically labels the
maintenance data of one database using data fields and links them with another
database through the free texts. The proposed method thus identifies the “failure”
events whose text descriptions are consistent with the definition of failure across
multiple maintenance databases.
An alternative approach to identifying failure times is to use an expert’s
interpretation of the free texts. In this case, the key challenge is the “expense” of
labelling; the expert must assess each text description individually and thus labelling
all of the data is infeasible. To mitigate this, an active learning approach is proposed
to construct a text classifier from a limited number of expert labelled samples.
The applicability of the methodologies is demonstrated on maintenance data sets
from electricity and sugar processing companies. The performance of the text
classifiers is assessed in terms of their accuracy, precision and recall measures.
Analysis of the text of the identified failure events seems to confirm the accurate
identification of failures. The results are expected to be immediately useful in
improving the estimation of failure times (and thus the reliability models) for real
world assets. Furthermore, the findings from the active learning based approach
demonstrated on industrial maintenance data reveal that failure time information can
be identified, allowing minimum maintenance data to be interpreted by the expert.
Active learning can decrease the number of labelled samples by approximately 50%,
while achieving the same classification accuracy. The outcomes of this study can be
used to develop statistical models of failure times from older historical databases and
maintenance, where the only consistently available data is a free text description.
iv Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
Table of Contents
Contents
Keywords .................................................................................................................................. i
Abstract .................................................................................................................................... ii
Table of Contents .................................................................................................................... iv
List of Figures ....................................................................................................................... viii
List of Tables ............................................................................................................................ x
List of Abbreviations ............................................................................................................. xiii
Statement of Original Authorship ......................................................................................... xvi
Acknowledgements .............................................................................................................. xvii
Chapter 1: Introduction ...................................................................................... 1
1.1 Background and Motivation ........................................................................................... 1
1.2 Research Questions and Objectives ............................................................................... 3
1.3 Research Contribution, Innovation and Significance ..................................................... 5
1.4 Publications .................................................................................................................... 6
1.5 Thesis Organisation ........................................................................................................ 6
Chapter 2: Literature Review ............................................................................. 9
2.1 Overview of the Maintenance Process ........................................................................... 9
2.1.1 Maintenance Policies and Strategies .................................................................. 11
2.2 Maintenance Optimisation ........................................................................................... 12
2.3 Failure Time Models .................................................................................................... 16
2.3.1 Virtual Age (VA) Model .................................................................................... 16
2.4 Degradation Models ..................................................................................................... 18
2.5 Maintenance objectives and costs ................................................................................ 21
2.6 Advantages and Disadvantages of the Models ............................................................. 23
2.7 Typically Available Maintenance Databases in Industry ............................................. 24
2.8 Knowledge Discovery .................................................................................................. 26
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach v
2.9 Text Mining ..................................................................................................................28
2.10 Text Cleaning and Feature Extraction ..........................................................................29
2.10.1 Bag of Words ......................................................................................................31
2.10.2 Term Frequency (TF)-Inverse Document Frequency (IDF) ...............................32
2.10.3 Chi-square (CS) Statistic ....................................................................................33
2.10.4 Information Gain (IG) ........................................................................................34
2.10.5 Language Model .................................................................................................34
2.11 Text Classification Algorithms .....................................................................................36
2.11.1 Naïve Bayes ........................................................................................................39
2.11.2 Maximum Entropy .............................................................................................39
2.11.3 Conditional Random Fields ................................................................................40
2.11.4 K-Nearest Neighbour .........................................................................................40
2.11.5 Support Vector Machine.....................................................................................41
2.12 Performance Evaluation ................................................................................................43
2.13 Supervised Machine Learning ......................................................................................44
2.14 Semi-Supervised Machine Learning .............................................................................47
2.14.1 Active Learning ..................................................................................................47
2.14.2 Semi-Supervised Self Training ...........................................................................52
2.15 Summary and Research Gap .........................................................................................53
Chapter 3: Information Requirement Specifications for Reliability and
Maintenance Optimisation Models ......................................................................... 57
3.1 Are Current Maintenance databases Sufficient for Maintenance Optimisation? ..........57
3.1.1 Identifying Failure and Planned Maintenance Times .........................................59
3.2 Requirement for Information Extraction Methodology ................................................63
Chapter 4: Failure Time Extraction Methodology Using Text Mining ........ 67
4.1 Motivation ....................................................................................................................68
4.2 Methodology .................................................................................................................68
4.2.1 Definition of Failure ...........................................................................................69
4.2.2 Database A Labelling .........................................................................................70
4.2.3 Features Extraction and Construction of Keyword Dictionary ..........................72
4.2.4 Classifier Construction and Failure Time Extraction .........................................74
4.3 Validation of the Methodology .....................................................................................75
vi Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
4.4 Application of the Methodology .................................................................................. 76
4.5 Summary ...................................................................................................................... 78
Chapter 5: Case Studies on Failure Time Extraction ..................................... 79
5.1 Case Study 1: Coal Fired Power Generation Company ............................................... 79
5.1.1 Overview of a Coal Mill .................................................................................... 79
5.1.2 Data Description and Text Cleaning .................................................................. 80
5.1.3 Work Order Labelling and Feature Extraction .................................................. 83
5.1.4 Training and Testing Text Classifiers ................................................................ 84
5.1.5 Comparison between Failure and Non-Failure Work Orders ............................ 86
5.1.6 Failure Time Extraction ..................................................................................... 87
5.1.7 Validation of the Text Classifier ........................................................................ 88
5.1.8 Application of the Methodology ........................................................................ 88
5.1.9 Comparison between Failure and Non-Failure DD using Text
Descriptions ....................................................................................................... 89
5.1.10 Cumulative Number of Failures before and after Text Mining ......................... 90
5.2 Case Study 2: Boilers in Sugar Processing Industry .................................................... 91
5.2.1 Overview of a Boiler System ............................................................................. 91
5.2.2 Data Description and Text Cleaning .................................................................. 92
5.2.3 Work Order Labelling and Feature Extraction .................................................. 93
5.2.4 Training and Testing Text Classifiers ................................................................ 94
5.2.5 Comparison between Failure and Non-Failure Work Orders ............................ 96
5.2.6 Failure Time Extraction ..................................................................................... 97
5.2.7 Validation of the Text Classifier ........................................................................ 97
5.2.8 Application of the Methodology ........................................................................ 99
5.2.9 Comparison between Failure and Non-Failure DD using Work
Descriptions ....................................................................................................... 99
5.2.10 Cumulative Number of Failures before and after Text Mining ....................... 100
5.3 Summary and Discussion ........................................................................................... 101
Chapter 6: Advanced Information Extraction Methodology Using Text
Mining and Active Learning ................................................................................. 103
6.1 Motivation .................................................................................................................. 104
6.2 Methodology .............................................................................................................. 105
6.2.1 Text Cleaning and Initial Training Data Formulation ..................................... 106
6.2.2 Active Learning via Uncertainty Sampling ..................................................... 107
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach vii
6.3 Case Studies ................................................................................................................109
6.3.1 Classifier Formulation and Benchmark Algorithm ..........................................111
6.3.2 Accuracy of the Text Classifier ........................................................................112
6.3.3 Validation of the Classifier ...............................................................................116
6.3.4 Failure Time Identification Using DD .............................................................117
6.4 Benefits of Including Expert Labelling ......................................................................118
6.5 Summary .....................................................................................................................121
Chapter 7: Conclusion and Future Research Directions ............................. 123
7.1 Conclusion ..................................................................................................................123
7.2 Future Research ..........................................................................................................127
Appendices .............................................................................................................. 129
Bibliography ........................................................................................................... 159
viii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
List of Figures
Figure 1-1. Overview of the research questions ........................................................... 4
Figure 2-1. Production and maintenance process [11] ............................................... 10
Figure 2-2. Maintenance management workflow [13] ............................................... 11
Figure 2-3. Evolution of maintenance strategies [14] ................................................ 11
Figure 2-4. Minimal, perfect and imperfect repair [35] ............................................. 17
Figure 2-5. Deterioration model ................................................................................. 19
Figure 2-6. Process of knowledge discovery in databases [63] ................................. 27
Figure 2-7. Data mining tasks .................................................................................... 28
Figure 2-8. Raw text data with causes of errors and anomalies ................................. 29
Figure 2-9. Features commonly used for text classification ...................................... 31
Figure 2-10. Commonly used classification algorithms for text classification .......... 38
Figure 2-11. A framework for supervised machine learning text classification ........ 45
Figure 2-12. General schema for passive and active learning [121] .......................... 48
Figure 2-13. Three main active learning query selection strategies [120] ................. 49
Figure 2-14. Uncertainty-based active learning that queries “b” ............................... 50
Figure 3-1. Overview of information extraction methodology .................................. 64
Figure 4-1. Methodology to extract failure and non-failure maintenance times ........ 69
Figure 4-2. Data filter and Data Base “A” labelling .................................................. 72
Figure 4-3. Application of the methodology .............................................................. 77
Figure 5-1. Overview of medium-speed (vertical spindle bowl) mill [140] .............. 80
Figure 5-2. Recording of two databases (WO and DD) during maintenance
process (coal mill) ........................................................................................ 81
Figure 5-3. Word cloud representing the keywords appearing in WO ...................... 83
Figure 5-4. Word clouds for (a) failure and (b) non-failure WO’s for coal mills ...... 87
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach ix
Figure 5-5. Cumulative number of failures for Unit X, Mill A (coal mill)................ 91
Figure 5-6. Functional layout of sugar processing system (adapted from [141]) ...... 92
Figure 5-7. Word cloud representing the keywords appearing in WO ...................... 93
Figure 5-8. Word clouds for (a) failure and (b) non-failure WO’s for boilers........... 97
Figure 5-9. Cumulative number of failures for boilers in sugar processing
industry ...................................................................................................... 100
Figure 6-1. Active learning techniques used in the methodology ............................ 105
Figure 6-2. Uncertainty-based Active Learning text classifier ................................ 106
Figure 6-3. The Uncertainty-based AL algorithm.................................................... 109
Figure 6-4. Classification accuracies of each model over the percentage of
labelled data increase ................................................................................. 114
Figure 6-5. Classification accuracies of each mixed-classifier over the
percentages of automatic and expert labelled data .................................... 120
x Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
List of Tables
Table 2-1. Elements of maintenance optimization considering different
maintenance strategies [17, 21-24] .............................................................. 13
Table 2-2. Types of information used for TBM based Failure Time Models ............ 18
Table 2-3. Data processing steps in KDD .................................................................. 27
Table 2-4. Text cleaning technique with different transformation processes [67,
69, 70] .......................................................................................................... 30
Table 2-5. Performance evaluation metrics ................................................................ 44
Table 3-1. Suggested data recording to support reliability modelling (i.e. failure
time modelling) ............................................................................................ 63
Table 4-1. Downtime classifications based on planned and unplanned
maintenance ................................................................................................. 70
Table 4-2. Criteria used for feature extraction and construction of keyword
dictionary ..................................................................................................... 73
Table 5-1. Five randomly selected data from (a) WO and (b) DD during
maintenance process (data is slightly edited to protect proprietary
information) ................................................................................................. 82
Table 5-2. Comparing a few WO data before and after cleaning process ................. 82
Table 5-3. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 1 ......................... 84
Table 5-4. A portion of Mixed-Gram keyword dictionary (𝑁𝑁𝑁𝑁12) for Case
Study 1 ......................................................................................................... 84
Table 5-5. Performances between SVM and NB classifiers using different
keyword dictionaries .................................................................................... 85
Table 5-6. Comparison of Accuracy and F-Measures between SVM and NB
using Different Keyword Dictionaries for Case Study 1 (Coal Mills) ........ 85
Table 5-7. Predicted instances of failure and non-failure downtimes (Coal
mills) ............................................................................................................ 87
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xi
Table 5-8. Cross tabulation of the DD comparing predicted labels with the
estimated ones .............................................................................................. 88
Table 5-9. Cross tabulation of testing work orders comparing predicted labels
to the actual ones .......................................................................................... 89
Table 5-10. Randomly selected predicted downtime data of Unit X Mill A
(coal mill) ..................................................................................................... 90
Table 5-11. Five randomly selected examples from (a) WO and (b) DD (data is
slightly edited to protect proprietary information)....................................... 92
Table 5-12. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 2 ....................... 94
Table 5-13. A portion of keyword dictionary (𝜒𝜒2500) for Case Study 2 .................. 94
Table 5-14. Performances between SVM and NB using different keyword
dictionaries ................................................................................................... 95
Table 5-15. Comparison of Accuracy and F-Measures between SVM and NB
using Different Keyword Dictionaries for Case Study 2 (Boilers) .............. 95
Table 5-16. Predicted instances of failure and non-failure downtimes (boilers) ....... 97
Table 5-17. Estimating actual DD labels using the existing data fields..................... 98
Table 5-18. Cross tabulation of the DD comparing predicted labels with the
estimated ones .............................................................................................. 98
Table 5-19. Cross tabulation of testing work orders comparing predicted labels
with the actual ones ...................................................................................... 99
Table 5-20. Randomly selected predicted downtime data for boilers ...................... 100
Table 6-1. A few randomly selected data entries from WO for (a) coal mills
and (b) boilers ............................................................................................ 110
Table 6-2. Classification accuracies of different models over two case studies ...... 115
Table 6-3. Accuracies of the AL-based classifiers (WO trained classifiers) on
DD .............................................................................................................. 116
Table 6-4. Predicted instances of failure and non-failure downtimes (coal mills) .. 117
Table 6-5. Predicted instances of failure and non-failure downtimes (boilers) ....... 117
xii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
Table 6-6. Performance of the mixed classifier over percentages of automatic
and expert labelled (uncertainty-based) WO ............................................. 118
Table 6-7. Performance of the mixed classifier over percentages of automatic
and expert labelled (randomly selected) WO ............................................. 119
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xiii
List of Abbreviations
AL: Active learning
ANN: Artificial neural network
ARA: Arithmetic reduction of age
ARI: Arithmetic reduction of intensity
BOW: Bag-of-words
CBM: Condition-based maintenance
CM: Corrective maintenance
CMMS: Computerized maintenance management system
CMS: Condition monitoring system
CS: Chi-square statistic
DCS: Digital control system
DD: Downtime Data
DM: Data mining
ECE: Expected cross-entropy
ERI: Electric research institute
FN: False negative
FP: False positive
FPM: Failure process modelling
FTD: Failure time data
IG: Information gain
KDD: Knowledge discovery in databases
kNN: k-nearest neighbour
LDA: Latent Dirichlet Allocation
xiv Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach
LM: Language model
LR: Logistic regression
MTBF: Mean time between failures
MTTR: Mean time to repair
NB: Naïve Bayes
OEE: Overall equipment efficiency
OR: Operations research
PAR: Proportional age reduction
PHM: Proportional Hazard Model
PI: Proportional intensity
PM: Preventive maintenance
QBC: Query-by-committee
RBF: Radial basis function
RCM: Reliability-centered maintenance
SSL: Semi-supervised learning
SSST: Semi-supervised self-training
SVM: Support vector machine
TBM: Time-based maintenance
TC: Text cleaning
TF: Term frequency
TF-IDF: term frequency-inverse document frequency
TM: Text mining
TN: True negative
TP: True positive
TPM: Total productive maintenance
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xv
TSVM: Transductive support vector machine
WO: Work Order/Notifications
QUT Verified Signature
Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xvii
Acknowledgements
My sincere thanks go to the Australian Government and the Australian people
for having provided me a scholarship, which enabled me to maintain a stable life for
myself and my family and successfully complete the whole PhD program with full
confidence in a great learning environment.
I would like to state great admiration to Professor Lin Ma for his supervision and
support through my doctoral studies at the Queensland University of Technology. I
would like to express my gratitude to Dr. Michael E. Cholette for all the help he
provided with concept, research approach, data analysis and result discussion that
made working in the area of text mining an enjoyable experience. My thanks also go
to Associate Professor Yue Xu for her diligent work and meticulous help to me. In
addition, I appreciate the counsellor skills of Dr. Azharul Karim, who helped me to
keep up my patience and allowed myself to not worry about obstacles I met during my
candidature. Thanks go also to all other people who have helped me during my PhD
journey.
I would like to acknowledge Julie Martyn, a professional editor from the Society
of Editors, Queensland who has provided thesis editing and proof reading service
according to the guidelines laid out in the University-endorsed national policy
guidelines.
I am grateful to my wife, Dr. Asma Akther who has dedicated her love, selfless
help and joined hands with me though so many years of ups and downs.
Chapter 1: Introduction 1
Chapter 1: Introduction
1.1 BACKGROUND AND MOTIVATION
Industrial organisations are continuously seeking new strategies to improve the
performance of their assets. Reliability analysis and maintenance optimisation play an
important role in the efficiency of industrial assets. Proper planning and timely
maintenance have been proved effective in improving asset reliability and
performance. In most cases, information required to develop such models is often
recorded in extensive asset and maintenance data and organised to support accounting
and basic analysis of maintenance decisions. Most asset owners and their maintenance
departments use a computerised maintenance management system (CMMS) to keep
records of all maintenance activities performed on the asset [1]. Ideally, one may wish
to exploit the vast quantities of such historical asset data to develop more sophisticated
analytics, including asset reliability and maintenance optimisation models.
Yet, asset data in CMMSs are often collected in an approach that is inconsistent
with reliability modelling, focusing on maintenance record keeping and accounting
without the specific intent of identifying asset failure times [2, 3]. To see why data
collection practices are often insufficient to identify failure times, consider a typical
example of a definition of asset failure: the inability of an asset to perform the required
function at a given time. To identify a “failure” event in maintenance data, one needs
to know when the asset was down due to unplanned maintenance. Unfortunately,
organisations often use different databases that only possess part of the information
needed to define a failure event. Databases usually contain the record of every
maintenance activity (i.e., repair, check or routine inspection) on an asset. However,
this does not always tell us which maintenance work is raised to fix a failure event or
whether the maintenance was actually planned. Thus, each database is incomplete and
insufficient to identify a “failure” event. One needs to know both when the asset is
down and if this downtime was unplanned. Existing data collection practices are
unable to provide such complete information.
Research literature regarding the theory of reliability and maintenance
optimisation often lacks discussion on how to analyse and identify the required
2 Chapter 1: Introduction
information from maintenance databases. The research of the past years has barely
discussed what information is needed to satisfy the requirements of such models.
Hodkiewicz and Ho [5] used case study approach to analyse data collection practices
and developed a data cleansing method in order to extract required information for
reliability analysis. A detailed identification of required data and collection of required
maintenance data is explored. However, without the actual definition of required
information (often referred to as the requirement specification), such information
cannot be identified accurately nor used in the reliability models. Most importantly,
one needs to define “failure” before trying to identify failure time information (for
instance asset downtime due to unplanned maintenance and whether maintenance was
unscheduled) in maintenance databases. Such necessities imply that maintenance data
need to be collected according to these requirement specifications.
Evidence also suggests that very few methodologies exist to extract the required
information from maintenance databases. Various ontology based learning methods
through case based reasoning (CBR) approaches have been proposed to categorise
industry maintenance logs [4]. However, they have not applied such methods in real
industrial settings. In recent years, some researchers [5, 6] have tried to identify failure
time information from maintenance data using asset replacement information. Major
reasons behind any asset replacement include end of asset life cycle, non-repairable
asset or failure of an asset. Without specifying the “failure” events in the maintenance
data and separating them from replacement information, actual failure time
information cannot be determined. Even a classification model [7] based on a
clustering technique was unable to provide a clear distinction between unscheduled
(i.e., failure) and scheduled maintenance on maintenance text data.
Using condition data, Moreira and Junior [8] proposed a method of performing
prognostics on aircraft components. Flight data and maintenance logs have been used
to classify the training data into healthy and unhealthy states. The degradation index
was finally created from the classification results to prepare a future schedule of
aircraft maintenance. A machine breakdown prediction model was developed using
condition monitoring and maintenance (e.g. corrective and preventative) data.
Chapter 1: Introduction 3
1.2 RESEARCH QUESTIONS AND OBJECTIVES
No effective solutions have been found so far to accurately identify the historical
failure time information from industrial maintenance data. Thus, the central question
of this thesis is how to bridge the gap between the information required for common
reliability and maintenance optimisation models and the information commonly
available in real world industrial maintenance databases. The advancement of big data
and machine learning tools (e.g. text mining) provide a unique opportunity to develop
intelligent linking to make use of the vast historical databases that many industries
maintain.
Based on the above discussion, the following research questions have been posed
and these are also outlined in Figure 1-1:
1. What information regarding the reliability and maintenance optimisation
models needs to be specified for improving the applicability of these
models?
2. How can typically available maintenance data be analysed and
transformed into the information required (as specified in Research
Question 1) for the models?
4 Chapter 1: Introduction
Figure 1-1. Overview of the research questions
To address the above questions, firstly, a preliminary literature review has been
conducted to analyse the requirement specifications regarding reliability and
maintenance optimisation models and state-of-art methodologies for text mining and
classification. Different models were investigated to propose a framework for the
requirement specifications. Based on the outcomes of these information models, a
methodology for analysing the text descriptions across multiple databases was
developed. Related to the construction of the text classifier, both supervised and semi-
supervised machine learning methods were tested. Initially, this study labelled one
Chapter 1: Introduction 5
database automatically using different data fields and linked it to another database
through the free texts. To overcome the shortcomings of automatic labelling due to
unreliable data fields, the proposed method was further modified by including a semi-
supervised text mining method. However, such methods often require expert
judgement to label the data, which is a time consuming process. This study used
expert-labelled maintenance data and tested the feasibility of such an advanced text
mining method to determine the failure time information in maintenance databases.
Performance measures of both the methods were tested in two real world scenarios.
The specific objectives of this research are as follows:
1. To systematically identify the information requirement specifications for
improving the applicability of the reliability and maintenance
optimisation models.
2. To develop a novel method to analyse typically available maintenance
data from industry and transform them into the information required for
the reliability and maintenance optimisation models.
1.3 RESEARCH CONTRIBUTION, INNOVATION AND SIGNIFICANCE
The overall contributions of the proposed research are classified into three areas.
Firstly, the information requirement specifications summarise the information
necessary for different reliability and maintenance optimisation models. Such a new
analysis within a requirement specification framework gives a direction to
maintenance practitioners for recording maintenance data in a standard way. Secondly,
a novel methodology has been developed to identify the failure time information using
data mining techniques. This novel method can extract information from multiple
maintenance databases in both numerical and text formats and is expected to lead to
more reliable failure time identification. Thirdly, the information requirement
specification and extracted failure time information can serve as useful modelling tools
in analysing stochastic repairable maintenance problems.
The outcomes of this research are significant in both industrial applications and
research activities. Firstly, the identification of information requirement specifications
has a revolutionary effect not only on accurate reliability and maintenance
optimisation decisions but also on maintenance work itself, in that people can collect
the appropriate required data. Secondly, the methods developed here enable improved
6 Chapter 1: Introduction
identification of events necessary for maintenance models, and thus improve the
quality of reliability models identified from this information.
1.4 PUBLICATIONS
• Kazi Arif-Uz-Zaman, Michael E. Cholette, Fengfeng Li, Lin Ma,
Azharul Karim. (2015). A data fusion approach of multiple maintenance
data sources for real-world reliability modelling. In Proceeding of 10th
World Congress on Engineering Asset Management (WCEAM-2015),
September 28-30, 2015, Tampere, Finland.
• Kazi Arif-Uz-Zaman, Michael E. Cholette, Lin Ma, Azharul Karim.
(2017). Extracting failure time data from industrial maintenance records
using text mining. Advanced Engineering Informatics, 33, p. 388-396,
Q1 Journal, Impact Factor: 2, SJR: 1.26.
• Kazi Arif-Uz-Zaman, Michael E. Cholette, Yue Xu, Lin Ma, Azharul
Karim. (To be submitted in February, 2018) Failure time identification
from a minimum set of industrial maintenance records using text mining
and active learning, intended for publication at the Computers in
Industry, Q1 Journal, Impact Factor: 1.685, SJR: 0.93.
1.5 THESIS ORGANISATION
To ensure a complete discussion of the research problems and objectives,
existing literature, proposed methodology and conclusions, the thesis is arranged as
follows:
• Chapter 1 provides a general overview of the thesis including research
motivations, problems, objectives, contributions, innovations and
significance as well as the organisation of the thesis.
• Chapter 2 reviews the related literature regarding maintenance
optimisation models, industrial maintenance data structure and text
mining methods. This chapter also lists the information required for time-
based and condition-based maintenance optimisation models. Research
gaps are identified and discussed.
Chapter 1: Introduction 7
• Chapter 3 investigates the sufficiency of existing maintenance databases
to satisfy the data requirement of reliability and maintenance
optimisation models. Information requirement specification gives a
direction to maintenance practitioners for recording maintenance data
accurately, which can then be used in reliability and optimisation models.
• Chapter 4 proposes a novel method to extract failure and non-failure
maintenance time information using typically available maintenance
data. Initially, a text classifier is constructed using the free texts of one
maintenance database. The classifier is subsequently applied to another
database to categorise them into two classes: failure and non-failure.
Thus, the method jointly uses multiple databases to determine asset
failure time information. This proposed method can analyse the huge
quantity of historical maintenance data and make it useful to more
sophisticated analytics i.e., reliability and maintenance optimisation.
• Chapter 5 investigates the proposed method in two industrial settings.
Case studies suggest that the method can efficiently bridge the gap
between maintenance optimisation models and available maintenance
data by identifying failure time information. A group of different
validation methods is also tested for both the case studies. The results are
proved to be effective for identifying failure time information for
industrial assets.
• Chapter 6 develops an advanced text mining method, an alternative way
to identify failure time information using expert judgement. In this
regard, active learning based text classifiers are constructed by querying
a minimum number of maintenance data and labelling them by an expert.
This method is tested on two case studies and can accurately identify the
historical asset failure times.
• Chapter 7 summarises the conclusion of the thesis and suggests possible
future research directions.
Chapter 2: Literature Review 9
Chapter 2: Literature Review
This chapter reviews some of the significant literature on maintenance processes,
modelling techniques and optimisation. A critical limitation in the application of
maintenance optimisation is the gap between information required for the maintenance
optimisation model and the ways data typically is collected on the maintenance process
in many organisations. As will be seen later in this thesis, a possible remedy to this
gap is analysing the free text descriptions in different maintenance databases and
literature in text mining will thus also be reviewed. Three main research areas are
discussed here:
• Reliability and Maintenance Optimisation Models: Investigate various
models and techniques used for reliability and maintenance optimisation
under different maintenance strategies and policies. A particular
attention was paid to the data and information needed for these models.
• Data available in typically-collected maintenance databases:
Summarise the available industry databases and asset and maintenance
data collected during the maintenance process.
• Data and Text Mining: Review text mining literature regarding text
cleansing methods, features, text classifiers, supervised and semi-
supervised machine learning methods.
2.1 OVERVIEW OF THE MAINTENANCE PROCESS
Maintenance is the function that monitors and keeps plant, equipment and
facilities working. According to EN 13306: 2001 Maintenance Terminology,
“Maintenance is the combination of all technical, administrative and managerial
actions during the life cycle of an item intended to retain it in or restore it to a state in
which it can perform the required function” [9]. The recognition of the value of
maintenance is a recent development. Waeyenbergh and Pintelon [10] noted that
maintenance became an integrated part of other operating functions and the concept
had shifted from failure-based repair to use-based repair and eventually towards
condition based maintenance. Maintenance is a process whose activities are carried
out simultaneously with the production process [11, 12]. Figure 2-1 shows the
10 Chapter 2: Literature Review
relationship between different objectives relating to production and maintenance
processes.
Figure 2-1. Production and maintenance process [11]
Production systems (which is the focus of most maintenance literature) usually
convert inputs (raw materials, energy, workload, etc.) into a product that satisfies
customer needs. The maintenance system, as a mix of when-what actions, labour, and
spare parts, together with other resources, aims to maintain equipment in good working
order, i.e., the equipment is able to provide the appropriate level of production
capacity. In a maintenance system, feedback control, planning, and organisation
activities are very critical and strategic issues. The first of these deals with the
production system and control of maintenance activity (e.g., workload allocations,
spare parts management). The various actions which must be taken to control
production and maintenance activities and to resolve breakdowns must be planned in
advance whenever possible.
Clearly feedback control requires maintenance action in downtime periods or
during an unexpected breakdown, to put the plant back into working order. In
unexpected breakdowns the planning phase is skipped and the maintenance work is
carried out as soon as possible. This is breakdown/corrective maintenance. Definitive
maintenance work is scheduled in a previously planned stop period. Maintenance
activities are so numerous and complex that they require effective management and
well-structured organisation (see Figure 2-2).
Chapter 2: Literature Review 11
Figure 2-2. Maintenance management workflow [13]
2.1.1 Maintenance Policies and Strategies
Maintenance strategies can be divided into two major types: proactive and
reactive (see Figure 2-3). Reactive or corrective maintenance (CM) occurs when the
asset breaks down resulting in unexpected shutdown and high maintenance cost.
Figure 2-3. Evolution of maintenance strategies [14]
12 Chapter 2: Literature Review
On the other hand, proactive maintenance (PM) is scheduled in order to minimise
the impact of a sudden breakdown, usually consumes fewer resources than CM and
can be accommodated in the production plans. In fact PM can be as simple as cleaning
filters, lubricating and changing oil, thus preventing the failure of a critical component
that is costly and takes time to be delivered. Because the operation schedules and
environment change dynamically in the real world, PM can take place unnecessarily.
To ensure PM occurs only when needed, condition-based maintenance (CBM) has
been introduced [15]. This can either take the form of regular inspections to evaluate
the assets’ wear or of sensors streaming data to diagnostic software. Therefore
maintenance tasks can be triggered only when the wear reaches a certain level. It is
worth mentioning that CBM is included under the general category of proactive
maintenance. CBM is defined by EN 13306:2010 as “preventative maintenance that
includes a combination of condition monitoring and/or inspection and/or testing,
analysis and subsequent maintenance actions” [16].
2.2 MAINTENANCE OPTIMISATION
Maintenance aims to retain assets in their operational states [17] or to improve
system availability however, since maintenance incurs cost there is a need to optimally
balance the objectives. In maintenance theory, this is the critical problem of
maintenance optimisation. A recent study on maintenance policy and optimisation
models revealed that maintenance cost can reach between 15% to 70% of the total
production expenditure, or in many cases, might even exceed annual net profit [18].
Thus an appropriate and optimised maintenance policy is essential for the financial
health of asset-intensive businesses and organisations. The main question faced by a
maintenance manager is this: what maintenance actions to take, and when to take them,
to gain an appropriate level of production from an asset.
Optimal maintenance policies provide a deliberate plan of action which guides
maintenance management by seeking the optimal balance between the costs and
benefits of the maintenance, while taking all kinds of constraints into account [19]. In
almost all cases, maintenance benefits consist of saving on costs which would be
incurred otherwise (e.g., less failure costs). Early, maintenance policies were based
on the sole aim of reducing the maintenance cost itself, without considering other
Chapter 2: Literature Review 13
factors which were equally important, such as reliability [20]. Much of the time,
minimising total maintenance cost will limit reliability to an unacceptable level in
practice. Therefore, to obtain the best performance and a balance between these aims,
total maintenance costs, reliability estimations, as well as other factors should be
considered simultaneously when devising maintenance policies.
A general optimisation problem can be classified into two major elements: the
objective function and the decision variable. The objective function is a mathematical
expression describing a relationship of the optimisation parameters or the result of an
operation that uses the optimisation parameters as inputs. A decision variable is a
quantity that the decision-maker controls. For example, the number of workers to
employ during the morning shift on a production floor may be a decision variable in
an optimisation model for labour scheduling. Minimising cost was reported as an
objective function in more than 70% of the studies [17]. In addition to minimising
costs, maximising availability and maximising production throughput, overall
equipment efficiency (OEE) optimisation objectives were identified by Horenbeek and
colleagues in [21]. Table 2-1 presents an overview of the elements (decision variables
and objective functions) necessary for maintenance optimisation.
Table 2-1. Elements of maintenance optimization considering different maintenance strategies [17, 21-24]
Maintenance
Policy
Maintenance Strategy
Proactive Reactive
Predictive Preventive Corrective
On-Condition Scheduled Run-to-Failure
Possible
Decision
Variables
Inspection Frequency PM Frequency
N/A
Maintenance
Threshold on condition
Maintenance Schedule
on time
Spare Parts (Reorder Level & Max. Stock Level)
Maintenance Priorities
Production (Buffer Size)
Possible
Objectives
Min. Cost; Max. Availability; Max. Throughput; Max. Profit, OEE
Determining how frequently assets should be maintained to achieve the best
possible solution is a continuing concern within the field. In cases where PM is
considered, the decision variable is the PM frequency. Depending on the maintenance
14 Chapter 2: Literature Review
policy, PM frequency can be based on periodic, age-based or constant-interval.
However, when the system of interest incorporates CBM or opportunistic
maintenance, the decision variable is the maintenance threshold that triggers
maintenance actions. If information on assets degradation is not streamed by on-line
systems, inspections are needed to evaluate this deterioration. Thus, according to this
logic, inspection intervals were included as a decision variable in [23]. In addition, the
priority of maintenance tasks is also included as decision variable.
Spare parts management is an important component in the maintenance system
and has a considerable impact on cost and availability. Several attempts have also been
made to investigate the effect of production parameters on maintenance systems in
manufacturing settings [25].
In general, maintenance optimisation covers the following aspects [19, 20]:
• Model description of a technical system and its function: a modelling of
the deterioration of the system over time and the possible consequences
for the system.
• A description of the available information about the system and the
action open to the management.
• Objective function and an optimisation technique which helps in finding
the best balance.
According to Chen and Pham [26], a model is a description of a process, system,
or concept, in a simple and systematic way which usually involves an explicit
mathematical formalisation of the process being studied. In maintenance policy
optimisation, a model is a description of a process to analyse and determine the optimal
maintenance policy under pre-determined maintenance objectives and criteria. An
example has been made by Dekker [19] regarding the description of the available
information in an age replacement model. If, upon failure, any component is replaced
by an identical one, the information regarding failure cost, 𝑐𝑐𝑓𝑓, preventive replacement
cost, 𝑐𝑐𝑝𝑝 and hazard rate, 𝑟𝑟(⋅) is required. The key result of optimisation is to identify
values of the decision variables that either maximise or minimise the objective
function. The stochastic behaviour of systems is mainly represented by the system
reliability estimates: availability, mean time between failure (MTBF) and failure
frequency, and the system maintenance cost measures: maintenance cost rate and
Chapter 2: Literature Review 15
discounted cost rate. Generally, an optimal maintenance policy may be the one which
either
• Minimises the system maintenance cost rate
• Maximises the system reliability estimates
• Minimises the system maintenance cost rate while the system reliability
requirements are satisfied, or
• Maximises the system reliability estimates when the requirements for the
system maintenance cost are satisfied.
The work on maintenance optimisation was initiated in the early 1960s [27, 28].
In the literature, the optimal maintenance models are based on different parameters
classified into various categories like information availability, single unit or multi-unit
systems, time event and state event and the model types are classified into those
dealing with optimality criteria, methods of solution and planning time [20].
Maintenance optimisation has been considered to fall into either of two kinds:
qualitative and quantitative. The former includes techniques such as total productive
maintenance (TPM) [20], reliability centred maintenance (RCM) [20], etc. while the
latter incorporates various deterministic/ stochastic models like the Markov decision
processes, Bayesian models, etc. There has been a long journey of maintenance
techniques evolution from corrective maintenance in 1940 to various operation
research (OR) models for maintenance and today to the proactive reliability-based
approach [14].
Sherif [29] classified the models according to the modelling of the deterioration
into: 1) deterministic Models, and 2) stochastic models. In the case of CBM-based
stochastic deterioration models, Alaswad and Xiang [22] produced further sub
classifications such as discrete-state deterioration, proportional hazard model (PHM)
and continuous-state deterioration type. A unique classification based on certainty
theory is mentioned by Ding and Kamaruddin [18] which is classified in terms of the
degree of certainty: certainty, risk, and uncertainty. To investigate the modelling
criteria and information requirement; different types of reliability, time-based and
condition-based maintenance models are discussed in the following sections.
16 Chapter 2: Literature Review
2.3 FAILURE TIME MODELS
2.3.1 Virtual Age (VA) Model
Kijima [30] used the idea of virtual age process of a repairable system to develop
an imperfect repair model. If any system has the virtual age 𝑉𝑉𝑘𝑘−1 = 𝑦𝑦 immediately
after the (𝑘𝑘 − 1) th maintenance, the 𝑘𝑘 th failure time 𝑋𝑋𝑘𝑘 is distributed as,
Pr[𝑋𝑋𝑘𝑘 ≤ 𝑥𝑥|𝑉𝑉𝑘𝑘−1 = 𝑦𝑦] =𝐹𝐹(𝑥𝑥 + 𝑦𝑦) − 𝐹𝐹(𝑦𝑦)
1 − 𝐹𝐹(𝑦𝑦) 2-1
where 𝐹𝐹(𝑥𝑥) is the failure-time distribution of the system. The failure rate 𝜆𝜆(𝑡𝑡) of such
model can be expressed as [30]:
𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝐴𝐴𝑘𝑘 + 𝑡𝑡 − 𝐶𝐶𝑘𝑘) 2-2
where, 𝐶𝐶𝑘𝑘 are the maintenance times (for any maintenance process, PM or CM), 𝐴𝐴𝑘𝑘 is
called the effective age at time 𝑡𝑡 and 𝑉𝑉𝑘𝑘 = 𝐴𝐴𝑘𝑘 + 𝑡𝑡 − 𝐶𝐶𝑘𝑘 represents the virtual age at
time 𝑡𝑡. The effective age is the virtual age of the asset after the last maintenance action.
Using effective age 𝐴𝐴𝑘𝑘 and virtual age 𝑉𝑉𝑘𝑘, different maintenance effects can be
constructed using virtual age models [31]. In case of minimal repair (often referred to
as ABAO), the effective and the last maintenance time are same, i.e., 𝐴𝐴𝑘𝑘 = 𝐶𝐶𝑘𝑘 for all
𝑘𝑘 ≥ 1. The failure rate is then only a function of time, 𝜆𝜆(𝑡𝑡) (i.e. it follows a non-
homogeneous Poisson process) [2, 32]. The perfect repair or AGAN is modelled as a
complete resetting of the effective age to zero (i.e., 𝐴𝐴𝑘𝑘 = 0 for all 𝑘𝑘 ≥ 1) (as shown in
Figure 2-4). In such case, the failure rate is [2, 32]:
𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝑡𝑡 − 𝐶𝐶𝑘𝑘) 2-3
In between minimal and perfect repairs, the maintenance effects can be
modelled using imperfect repair; the maintenance effect is supposed to reduce the
virtual age by an amount proportional to the supplement of age accumulated since the
last maintenance. When both maintenance actions have the same effect then the
effective age, 𝐴𝐴𝑘𝑘 (i.e., the last maintenance action is presumed to reduce the operating
time [33]) can be written as [31, 32, 34], 𝐴𝐴𝑘𝑘 = (1 − 𝜌𝜌)𝐶𝐶𝑘𝑘. The failure rate after the
maintenance can be shown to be [33]:
𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝑡𝑡 − 𝜌𝜌𝐶𝐶𝑘𝑘) 2-4
Chapter 2: Literature Review 17
Figure 2-4. Minimal, perfect and imperfect repair [35]
Pulcini [36] proposed a Bayes approach within a proportional age reduction-
power law process (PAR-PLP). Based on the initial failure rate as 𝜆𝜆1(𝑡𝑡) = 𝛽𝛽𝛼𝛼�𝑡𝑡𝛼𝛼�𝛽𝛽−1
,
the conditional failure rate can be expressed by the following [33, 36]:
𝜆𝜆𝑘𝑘(𝑡𝑡) =𝛽𝛽𝛼𝛼�𝑡𝑡 − 𝜌𝜌𝐶𝐶𝑘𝑘
𝛼𝛼�𝛽𝛽−1
2-5
18 Chapter 2: Literature Review
Types of information used in the failure time models from the selected literature
have been represented in Table 2-2.
Table 2-2. Types of information used for TBM based Failure Time Models
Model/Method
Information Types Maintainable System/Unit
Failure/ Preventive Maintenance (PM) Times
Failure History Completeness
Maintenance Type CM with Stops CM with Delay PM
Arithmetic Reduction of Age (ARA)[31]
Arithmetic Reduction of Intensity (ARI)[34]
Inlet Header of Heat Exchanger
Failure Times, PM Times
Censored PM, CM with Stops or Delay
Water Pump Failure Times, PM Times
Left Truncated PM, CM with Delay
Change Point Detection[37]
Electronic Board Failure Times Complete CM with Delay
Bayes Approach[36] Cooler System in Power Plant
Failure Times, PM Times
Censored PM,CM with Stops
Proportional Age Reduction (PAR)[33]
Cooler System in Power Plant
Failure Times, PM Times
Censored PM, CM with Stops
Proportional Intensity (PI)[38]
Airplane Air Conditioning
Failure Times Complete CM with Stops
Reliability Assessment Framework[39]
Centrifugal Pump Failure Times Truncated CM with Stops
Failure Process Modelling (FPM)[40]
Light Commercial Vehicle
Failure Times Censored CM with Delay
The above discussion suggests that failure time models require effective age 𝐴𝐴𝑘𝑘,
historical maintenance times 𝐶𝐶𝑘𝑘 (i.e., failure times or CM, preventive maintenance
times or PM), maintenance effect 𝜌𝜌 and current age of the asset.
2.4 DEGRADATION MODELS
When a system condition is directly observable, stochastic deterioration models
are usually considered. Any system subject to a deterioration process has an increasing
failure rate and is usually considered for both corrective and preventative maintenance
(see Figure 2-5) [19].
Chapter 2: Literature Review 19
Figure 2-5. Deterioration model
𝑋𝑋(𝑡𝑡) represents the deterioration state over time 𝑡𝑡. At the beginning (𝑡𝑡 = 0), the
system is said to be in as good as new state. Since, the system has an increasing failure
rate, it is considered for preventive maintenance till the failure level. It is important to
be mentioned that the inspection time can be chosen by either arbitrarily or imposed
by optimum maintenance decision. When the deterioration system exceeds a failure
level at 𝐿𝐿 , the system is said to be in a “failed” state. With a condition-based
maintenance strategy, one has to decide whether to replace the deteriorated system due
through preventative or corrective maintenance and choose the date for the next
inspection of the system.
Deterioration models can be classified into three types: proportional hazard
model (PHM), discrete state deterioration and continuous state deterioration [22]
models. Models have also been reviewed for both single and multi-unit systems.
Discrete state models are often formulated by a Markov model [41]. To relax the rather
strict conditions of the Markov process or to model partially available system
information, a semi-Markov model or a hidden Markov model may be used. To model
continuous state deterioration, three processes are widely used: Wiener, gamma and
inverse Gaussian processes [22]. In particular, gamma process models are widely
applied as a model of a monotonic degradation process. A homogeneous gamma
process can be formulated by Eq. 2-6 [42]:
𝑡𝑡𝛼𝛼,𝛽𝛽(𝑡𝑡) =1
𝛤𝛤(𝛼𝛼)𝛽𝛽𝛼𝛼𝑡𝑡𝛼𝛼−1𝑒𝑒−
𝑡𝑡𝛽𝛽 , 𝑡𝑡 > 0 2-6
20 Chapter 2: Literature Review
In which, 𝛤𝛤(𝛼𝛼) = ∫ 𝑧𝑧𝛼𝛼−1𝑒𝑒−𝑧𝑧𝑑𝑑𝑧𝑧∞
0 denotes the gamma function with shape parameter
𝛼𝛼 > 0 and scale parameter 𝛽𝛽 > 0. To model heterogeneous degradation, an inverse
Gaussian process is usually suitable [22].
PHM is one of the most popular degradation paradigms. It was first introduced
by Cox in 1972 and since then it has become an important statistical regression model
[43]. PHM is commonly applicable to multivariate failure models while the system
deteriorates by the effects of covariates. This is an approach to model an asset’s hazard
using condition monitoring data. The PHM model can be formulated and is shown by
Eq. 2-7 [44]:
ℎ(𝑡𝑡) = 𝛽𝛽
𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1
exp[𝛾𝛾1𝑧𝑧1(𝑡𝑡) + 𝛾𝛾2𝑧𝑧2(𝑡𝑡) + ⋯+ 𝛾𝛾𝑚𝑚𝑧𝑧𝑚𝑚(𝑡𝑡)] 2-7
where, ℎ(𝑡𝑡) is the hazard rate of failure at time 𝑡𝑡, given the 𝑚𝑚 covariate values of
𝑧𝑧1(𝑡𝑡), 𝑧𝑧2(𝑡𝑡), …, 𝑧𝑧𝑚𝑚(𝑡𝑡). The baseline hazard ℎ0(𝑡𝑡) = 𝛽𝛽𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1
is that from the Weibull
model. Each 𝑧𝑧𝑖𝑖(𝑡𝑡) is a covariate or explanatory variable, representing a monitored
condition data item at the time of inspection 𝑡𝑡 , for instance, voltage, current,
temperature, humidity, or measure of stress. The product of 𝑧𝑧𝑖𝑖 and 𝛾𝛾𝑖𝑖 determines the
influence of covariates on the hazard rate of failure. According to the principles of
reliability analysis, the reliability and failure probability density can be estimated as
[43]:
𝑅𝑅(𝑡𝑡) = exp �−� ℎ(𝑡𝑡, 𝑧𝑧𝑖𝑖)𝑑𝑑𝑡𝑡𝑚𝑚
0� = exp �− �
𝑡𝑡𝜂𝜂�𝛽𝛽
exp(γi ∙ 𝑧𝑧𝑖𝑖)� 2-8
𝑡𝑡(𝑡𝑡) =𝛽𝛽𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1
exp[𝛾𝛾1𝑧𝑧1(𝑡𝑡) + 𝛾𝛾2𝑧𝑧2(𝑡𝑡) + ⋯
+ 𝛾𝛾𝑚𝑚𝑧𝑧𝑚𝑚(𝑡𝑡)]exp �− �𝑡𝑡𝜂𝜂�𝛽𝛽
exp(γi ∙ 𝑧𝑧𝑖𝑖)� 2-9
According to Ahmad and Kamaruddin [15], condition monitoring data can be
classified into three types: the value type, waveform type, and multi-dimensional type.
Chapter 2: Literature Review 21
Value type data exist in a single value, examples of which include oil analysis data,
temperature, pressure, humidity, and quality scale. Waveform and multi-dimensional
type data can also be referred to as signal and image forms, respectively. Examples of
signal forms include vibration and acoustic data, which are typical of waveform type
data. Image forms such as infrared thermographs, visual images and X-ray images are
examples of multi-dimensional type data. Value type CBM data can be exemplified by
the research of Moreira and Junior [8] who investigated the prognostics of the aircraft
bleed valve using a kind of condition monitoring data called the air management
system (AMS). This included cabin temperature, pressurization and air renewing and
cycling. Another example based on condition monitoring data is that of Bastos and
associates [45] applied a prediction algorithm to estimate the possibilities of machine
breakdown, which supported decisions about maintenance interventions.
The data required for degradation modelling is quite different from failure time
models. For the stochastic process models (Markov, Gamma), a direct condition
indicator (e.g. thickness of a protective coating) is used and a threshold on this quantity
is used to define failure. When such a direct observation is not available, hidden
Markov Models can be used to infer the unobservable condition from imperfect
observations (e.g. vibration data). In addition to these condition data (i.e. covariates),
PHM models require failure time information to identify the hazard rate parameters.
Yet, such failure times may also be needed for stochastic process models to define a
failure threshold if one cannot be developed from first principles. Thus, degradation
likely models require identification of failure times and key condition indicators for
modelling.
2.5 MAINTENANCE OBJECTIVES AND COSTS
Maintenance optimisation is either failure time or degradation model coupled
with cost model. Generally, the objective functions of maintenance optimisation are
the cost rates. For TBM, the cost rate can be computed as [46, 47]:
𝑇𝑇𝐶𝐶(𝑇𝑇) =
𝐶𝐶𝑐𝑐𝑐𝑐𝐹𝐹(𝑇𝑇) + 𝐶𝐶𝑝𝑝𝑐𝑐𝑅𝑅(𝑇𝑇)
∫ 𝑅𝑅(𝑡𝑡)𝑑𝑑𝑡𝑡𝑇𝑇0
2-10
where, 𝐶𝐶𝑝𝑝𝑐𝑐 is the preventive maintenance unit cost, 𝐶𝐶𝑐𝑐𝑐𝑐 is the corrective maintenance
unit cost, 𝑇𝑇 is the optimum time of replacement, 𝐹𝐹(𝑇𝑇) is the cumulative distribution
22 Chapter 2: Literature Review
function, 𝑅𝑅(𝑡𝑡) is the reliability function and 𝑇𝑇𝐶𝐶(𝑇𝑇) is the total maintenance cost per
unit time. For CBM the cost rate may be computed as [47-49]:
𝑇𝑇𝐶𝐶(𝑑𝑑) =𝐶𝐶𝑝𝑝𝑐𝑐(1 −𝑄𝑄(𝑑𝑑)) + �𝐶𝐶𝑝𝑝𝑐𝑐 + 𝐾𝐾�𝑄𝑄(𝑑𝑑)
𝑊𝑊(𝑑𝑑)
=𝐶𝐶𝑝𝑝𝑐𝑐(1 − 𝑄𝑄(𝑑𝑑)) + 𝐶𝐶𝑐𝑐𝑐𝑐𝑄𝑄(𝑑𝑑)
𝑊𝑊(𝑑𝑑) 2-11
where 𝑑𝑑 is a threshold on the condition indicator, 𝐶𝐶𝑐𝑐𝑐𝑐 = 𝐶𝐶𝑝𝑝𝑐𝑐 + 𝐾𝐾 is the failure
replacement cost, 𝑄𝑄(𝑑𝑑) is the probability that a failure replacement will occur and
𝑊𝑊(𝑑𝑑) is the expected time until replacement (whether preventive or corrective).
For both TBM and CBM, maintenance cost rate is the total maintenance cost
over the cycle time. All of the models have total maintenance cost in common which
is composed of the following costs. In TBM, the total maintenance cost can be
calculated using Eqs. 2-12 and 2-13: [15]:
𝐶𝐶𝐶𝐶𝐶𝐶 = 𝐶𝐶𝑚𝑚,𝐶𝐶𝐶𝐶 + 𝐶𝐶𝑐𝑐 + 𝐶𝐶𝑑𝑑,𝐶𝐶𝐶𝐶 2-12
𝐶𝐶𝑃𝑃𝐶𝐶 = 𝐶𝐶𝑚𝑚,𝑃𝑃𝐶𝐶 + 𝐶𝐶𝑑𝑑,𝑃𝑃𝐶𝐶 2-13
where, 𝐶𝐶𝐶𝐶𝐶𝐶 is the CM cost (or, total failure cost), 𝐶𝐶𝑃𝑃𝐶𝐶 is the PM cost, 𝐶𝐶𝑚𝑚,𝐶𝐶𝐶𝐶 (𝐶𝐶𝑚𝑚,𝑃𝑃𝐶𝐶)
are the direct maintenance costs (e.g. labour and spare parts) when corrective
(preventive) maintenance is undertaken, 𝐶𝐶𝑐𝑐 is the product reject cost, or, cost of
product loss when the machine fails and 𝐶𝐶𝑑𝑑,𝐶𝐶𝐶𝐶 (> 𝐶𝐶𝑑𝑑,𝑃𝑃𝐶𝐶) is the downtime cost when a
corrective (preventive) maintenance action is taken.
Direct maintenance costs 𝐶𝐶𝑚𝑚 can mostly be obtained from maintenance logs.
Considering the overhead cost of each maintenance personnel and the cost of the parts
that are replaced during maintenance, direct maintenance costs are well calculated in
accounting records. However, the product reject and the downtime costs are mostly
related to the operational context. The product reject cost 𝐶𝐶𝑐𝑐 can be determined from
the production of the machines, depending on the production process (e.g. sequence or
parallel). The downtime cost 𝐶𝐶𝑑𝑑 can be calculated using the total downtime in the
production. So, product reject and downtime costs are usually approximated with an
understanding of the operation context and can be ascertained from the production
data.
Chapter 2: Literature Review 23
For CBM, let 𝑑𝑑(𝑡𝑡) denote the time passed in a failed state in [0,t], then total the
maintenance cost 𝑇𝑇𝐶𝐶(t) can be shown to be [22]:
𝑇𝑇𝐶𝐶(𝑡𝑡) = 𝐶𝐶𝑐𝑐𝑐𝑐𝑁𝑁𝑐𝑐𝑐𝑐(𝑡𝑡) + 𝐶𝐶𝑝𝑝𝑐𝑐𝑁𝑁𝑝𝑝𝑐𝑐(𝑡𝑡) + 𝐶𝐶𝑖𝑖𝑁𝑁𝑖𝑖(𝑡𝑡) + 𝐶𝐶𝑑𝑑𝑑𝑑(𝑡𝑡) 2-14
where, 𝐶𝐶𝑐𝑐𝑐𝑐 is the corrective replacement cost, 𝐶𝐶𝑝𝑝𝑐𝑐 denotes the preventative
replacement cost, 𝐶𝐶𝑖𝑖 is the inspection cost and 𝑁𝑁𝑐𝑐𝑐𝑐, 𝑁𝑁𝑝𝑝𝑐𝑐 and 𝑁𝑁𝑖𝑖 represent the (random)
number of corrective repairs, preventive repairs and inspections respectively in [0,t].
2.6 ADVANTAGES AND DISADVANTAGES OF THE MODELS
Main advantage of TBM is that it requires maintenance (failure and preventive)
times only. Most of the TBM optimisation thus require historical failure times,
recorded throughout the life time of the asset [31, 33, 34, 36-40, 50] and total
maintenance cost. Although, the total maintenance costs are easily approximated, there
are still challenges in obtaining reliable failure time data. Due to incorrect recording,
such data are not always available, or sometimes unusable [15]. For instance, asset
maintenance may be planned (i.e. not reactive), which must be treated differently from
failure times in estimation. Without careful data recording practices, such data might
be confused with failure time data, which might lead to inaccurate estimation of
reliability models. Another challenge is the connection of maintenance data with actual
asset stoppages. In most cases, the failure time models [33, 34] assumed that a
maintenance events lead to immediate break-down (e.g., stoppage) of the asset.
On the other hand, the main advantage of CBM is that it is economically superior
in terms of total maintenance cost [51]. Unlike TBM which employs fixed
maintenance time interval, in CBM, maintenance is performed only when it is needed.
CBM helps to reduce maintenance setup cost and unnecessary maintenance actions
[19]. So, various degradation models and optimisation methods in CBM are preferable
if all the data (i.e., failure time data, condition data) are available. However, there are
challenges and difficulties in obtaining condition and failure time data. The major
challenge is data availability [15]. The condition data might be expensive to collect
and store, requiring specialized sensors (e.g. vibration or acoustic emission sensors)
and data acquisition equipment. Moreover, significant expertise and effort is needed
to develop degradation models. Systems that are subject to multiple degradation
24 Chapter 2: Literature Review
process, caused by both internal and external failures, can create great mathematical
complexity in developing degradation modelling [19].
So, it is essential that, such required data are properly recorded in typically
available maintenance databases. However, in many organizations current practices,
maintenance data are not properly recorded and linked to identify failure events. It is
thus challenging to analyse industrial maintenance databases to identify failure times,
which are needed for both CBM and TBM.
2.7 TYPICALLY AVAILABLE MAINTENANCE DATABASES IN INDUSTRY
A maintenance documentation system for recording and conveying information
is an essential operational requirement for all the elements of the maintenance
management process. Maintenance documentation can be defined as [52]: any record,
catalogue, manual, drawing or computer file containing information that might be
required to facilitate maintenance work. Simultaneously, a maintenance information
system can be defined as [52]: the formal mechanism for collecting, storing, analysing,
interrogating and reporting maintenance information. This information could come
from a variety of data sources.
Nowadays, two main data collection systems are implemented in many
maintenance departments: the computer maintenance management system (CMMS)
and the condition monitoring system (CMS) (or the distributed control system (DCS)
database). The former is the core of traditional maintenance record-keeping practices
and often facilitates the usage of textual descriptions of faults and actions performed
on an asset. On the other hand, CMS/DCS data can be used to directly monitor asset
and component parameters.
Hong [53] mentioned three main sources of reliability data: laboratory life tests,
field tracking studies and a warranty database. Laboratory reliability testing is often
used to make product design decisions as to whether the “real” reliability data comes
from the field, often in the form of warranty returns or, specially designed field-
tracking studies [54]. Although warranty data is a very rich source of reliability
information, these data have common problems: failure mode information is
sometimes unavailable and censored in nature.
Chapter 2: Literature Review 25
In many industries (e.g. production facilities), data on maintenance occurrence
and workflow are directly recorded. This is the second major source of reliability data,
where companies and organisations keep detailed records of the costs of maintenance
for their assets (e.g., information about the reliability for a fleet of automobiles or
locomotives or transformers).
However, maintenance data generally lack important engineering information
due to their reporting rules. In short, such databases were designed for financial
reporting and maintenance workflow control rather than answering engineering
questions. Additionally, challenges arise from several data collection difficulties
including accuracy, correctness, duplication, consistency, timeliness, validity,
reliability and completeness [55]. Recognising this, Moore and Starr [56] identified
production schedules and financial records as complements to CMMS and CMS/DCS
data to inform maintenance practices.
However, for an effective use of maintenance decision making, it is necessary to
have reliable and consistent data in maintenance databases [57]. Databases should
contain data related to equipment functioning, failures and their consequences as well
as maintenance operations and their costs, in order to optimise maintenance. The best
case would be if such information were collected from the same equipment (specific
failure data) or from analogous equipment in similar conditions. The analysis and
treatment of collected data will allow calculating and validating maintenance
optimisation models, and re-planning for production and maintenance operations or
actions.
Most process industry and manufacturing plants use CMMS databases to help
manage maintenance performed on plant assets. CMMS benefits include asset
information, maintenance work planning and scheduling, maintenance history and
maintenance reporting etc. One of the most widely available sources of maintenance
data is the so-called work order (sometimes called work notifications/event). This type
of data documents the history of all maintenance events that occur. The events may
include inspection, repair and replacement, and may be corrective or preventative
actions. Work orders typically include data related to maintenance planning,
scheduling, and execution with work descriptions. Describing work order
notifications, Sipos and colleagues termed these as log data which is a collection of
26 Chapter 2: Literature Review
events recorded during various maintenance applications which have been run on the
equipment [6].
Outage/stop data is another source of information which can also be available
in CMMS and is one of the important sources to identify failure time information.
Generally asset stoppage information due to failure or planned outage is recorded here.
For example, Alkali et al. [58] used a plant information (PI) database to extract failure
information for a coal fired power plant, using the PI records of the motor’s current to
indicate the mill’s on or off status. Predicting the vehicle compressor failure
information, the service record (SR) database which contains repair information
including previous failure records has also been widely used [3].
A limited number of methods have been proposed to bridge the incompleteness
among maintenance databases. The most effective source of reliable information is
often in the free text descriptions (used to describe the repair process) in different
maintenance databases [5], which are difficult to quantitatively analyse. Text
classification methods provide a possible set of tools for analysing such free texts [59].
Text mining methods seek to execute a series of natural language processing (NLP)
steps to extract useful information [60]. Exploring potential linkages through analysing
and extracting numerical and text data from maintenance databases, via knowledge
discovery and machine learning methods, is investigated in the next sections.
2.8 KNOWLEDGE DISCOVERY
Every day, 2.5 quintillion bytes of data are created and 90 percent of the data in
the world today were produced within the past two years [61]. A fundamental
challenge is to explore large volumes of data and extract useful information or
knowledge from that. Data mining (DM) is the analysis of (often large) observational
datasets to find novel relationships and to summarise the data in novel ways that are
both understandable and useful to the data owner [62]. DM, a term also popularly
enhanced to refer to the term knowledge discovery in databases (KDD), is the
automated extraction of patterns representing knowledge implicitly stored in large
databases, data warehouses, and other massive information repositories [63] and has
become an increasingly important research area. Generally, the entire KDD process
follows the steps shown in Figure 2-6 starting from data selection to knowledge
acquisition through data processing and data mining [62, 64, 65].
Chapter 2: Literature Review 27
Figure 2-6. Process of knowledge discovery in databases [64]
KDD is like an umbrella for all those methods that aim to discover relationships
and regularity among the observed data. It includes various stages, from the selection
of required datasets to the interpretation of results attained from the techniques applied.
KDD refers to the overall process of the stages of finding and discovering knowledge
from data, of which DM is one step, in the process, consisting of applying data analysis
and discovery algorithms. KDD comprises three general processing steps as shown in
Table 2-3. The first stage is data pre-processing, which entails data collection, data
smoothing, data cleansing, data transformation and data reduction.
Table 2-3. Data processing steps in KDD
Knowledge Discovery in Databases (KDD)
Data Pre-Processing
Data Mining
Data Post-Processing
The second step, normally called DM, involves data modelling and prediction.
DM can involve either data classification or prediction. The classification methods
include deviation detection, database segmentation, clustering (and so on); the
predictive methods include: (a) mathematical operation solutions such as linear
scoring, nonlinear scoring (neural nets), and advanced statistical methods like the
multiple adaptive regression by splines; (b) distance solutions, which involve the
nearest-neighbour approach; (c) logic solutions, which involve decision trees and
decision rules. The third step is data post-processing, which is the interpretation,
conclusion, or inferences drawn from the analysis in Step Two. So, KDD is a
multiphase process that includes business understanding, data preparation, modelling,
28 Chapter 2: Literature Review
evaluation and deployment [66, 67]. According to the application and tasks, DM can
be categorised into eight distinctive branches (see Figure 2-7) [66]. One of the
important tasks of DM is text mining which is slightly different from the general KDD
process. Text mining can be defined as discovering useful information and knowledge
from textual databases through the application of data mining techniques.
Figure 2-7. Data mining tasks
2.9 TEXT MINING
Text mining (TM) is a particular type of DM that is focused on handling
unstructured or semi-structured datasets, such as text documents on paper, Excel
reports, web pages, messages, notes etc. So TM can be defined as textual data mining
or knowledge discovery from textual databases. Although the text mining process
relies heavily on applications of data mining techniques for discovering useful
knowledge, it is also focused on handling more unstructured data formats which pose
more challenges for pattern discovery than numerical data formats do.
Text documents usually consist of terms or keywords in sentences. The primary
step in TM is to cleanse the text documents by applying several conversion processes
[68]. To aid the DM methods on text data, keywords/terms present in each text
document are converted to term vectors. A term vector is an algebraic expression that
Chapter 2: Literature Review 29
describes the relationship between text words and documents, and is commonly used
as a dataset for text-mining based analysis. In this method, each dimension of the
vector corresponds to an individual term, which can be a single word or keyword or
sometimes a longer phrase. If a specific document includes a specific term, the vector
value of that term should be more than zero. In using term vectors, which are based on
the keywords selected for text-mining analysis, it is imperative to consider how they
should be standardized. After text cleansing and term vector formulation, the keyword
can be stored in a keyword dictionary. Detailed discussions of text cleansing, text
features and machine learning algorithms and methods are presented in the following
sections.
2.10 TEXT CLEANING AND FEATURE EXTRACTION
Text cleaning (TC) represents the most time consuming phase in text mining, the
complexity of which depends on the data sources used. Text documents usually consist
of words or terms of sentences containing unwanted sparseness in the text corpus,
spelling variations, space, numbers, punctuation, and, most importantly,
discriminating words. Common causes of uncleansed data have been mentioned by
Low and associates in [69] and these are presented in Figure 2-8.
Figure 2-8. Raw text data with causes of errors and anomalies
TC helps to consider the most frequent words from large text documents by
removing and excluding the less frequent ones. This is usually done to remove
punctuation, numbers and other characters within texts that may clutter results. In this
section, a series of transformation functions to cleanse text data are discussed, which
are summarised in Table 2-4.
30 Chapter 2: Literature Review
Table 2-4. Text cleaning technique with different transformation processes [68, 70, 71]
Function Purpose of the Transformation
tolower Transform all upper case letters to lower case
removeNumbers Remove all numbers
Stopwords
(language=‘english’)
Remove stop words commonly used in “English” language
myStopwords Remove “non-discriminating words that have negative effects on text
classification”
removeWords,
myStopwords
Remove words which are common, non-discriminating and mean
little to the model
removePunctuation Remove punctuation symbols
remove stripWhitespace Remove extra spaces
word stemming Reduce similar words into a single term
First, all the text documents are transformed into lower cases followed by
removing numbers, punctuation and extra spaces in between words. A common
practice when analysing text data is to remove filler words such as “to”, “and”,
“where”, “or”, “when” etc. These are known as stop words. The function Stopwords
(language=‘english’) shown in Table 2-4 considered 174 words which have been
excluded from the text documents. Apart from that, some keywords are considered to
be common but non-discriminating for both data types (failure and non-failure), and
were also excluded through the cleansing process. Specific examples and a list of those
keywords will be discussed in the case studies (Chapter 5).
Text classification is different from other classification methods because text
feature space is often sparse and high-dimensional. For instance, the dimensionality of
a moderate-sized text corpus can reach up to tens or hundreds of thousands. The high
dimensionality of feature space will cause the ‘‘curse of dimensionality’’, increase the
training time and affect the accuracy of classifiers. Therefore, feature extraction is
performed, which aims to reduce the dimensionality under the premise of guaranteeing
the performance of classifiers. The main idea is to select a subset of terms occurring
in the training set and use this subset as features in text classification. Two important
advantages of text features include:
• Reduces of the “feature space” dimensionality by choosing the most
valuable features
Chapter 2: Literature Review 31
• Improvement of the performance of text classifiers
Commonly used text features include bag of words (BOW) [72] and language
model/ order of words/ N-Gram [73, 74] (see Figure 2-9). Most effective (for text
classification) and widely used features for both types are discussed in the following
sections.
Figure 2-9. Features commonly used for text classification
2.10.1 Bag of Words
Usually, a text is represented as a vector of weighted terms, involving a two
phase conversion process. Firstly, a vector space model [75], namely a bag of words
(BOW), is built, which covers all unique items occurring in the training corpus.
Secondly, the text is mapped into a feature vector based on both the BOW and the
contents of the text. A BOW representation scheme is widely used in text classification
due to its simplicity and efficiency. Under this scheme, documents are represented by
bags of terms, each term being an independent feature of its own.
A document can be represented as a vector. Each item in the vector corresponds
to an individual term and its value can be defined as a binary indicator or the absolute
32 Chapter 2: Literature Review
frequency. Many features have been explored, among which are term frequency (TF),
term frequency-inverse document frequency (TF-IDF), information gain (IG), Chi-
square (CS) statistics, mutual information (MI), Gini-Index (GI) and expected cross-
entropy (ECE). Most of the methods use the frequency of every feature in the BOW
or assign score on the probability. After that a feature rank is made based on their
probability or score and finally the top features are selected. This means that features
have been assigned ranks based on their frequency. Finally, the top ranked features are
selected.
According to Yang and Pedersen [76], IG and CS outperform MI and ECE and
achieved better accuracy. Rogati and Yang [77] also conducted a comparative study
of feature selection methods (i.e., IG, CS and TF) for different text classification
algorithms (i.e., NB, SVM, k-NN) on two well-known datasets: Reuters 21578 and
Reuters Corpus version RCV-1. Their experimental results indicated that CS based
feature selection method outperformed other methods for classifiers and both the
datasets.
Comparing the performances of IG, CS and document frequency (DF) features,
Zhang and colleagues [78] conducted an experiments on the Spam filters:
SpamAssasin, LingSpam, PU1 corpora and Chinese corpus ZH1. It was verified by
their experiments that IG led to the best performance, followed by CS. Liu and
associates in [79] investigated experimental comparisons of four feature selection
methods: DF, CS, IG and gain ratio (GR) over five text classification algorithms: NB,
SVM, k-NN, radial basis function neural network (RBFNN) and decision tree (DT)
for multi-class sentiment classification. In terms of achieving best classification
accuracy within the shortest execution time, IG and GR outperformed other methods.
However, some have argued that the method using TF could achieve a
comparable performance to CS [80]. In response to such heterogeneous findings,
Cheng and colleagues [81] suggested using different methods for different applications
since good features should consider problem domain and algorithm characteristics.
Such key features are discussed below.
2.10.2 Term Frequency (TF)-Inverse Document Frequency (IDF)
Term frequency (TF) is the most general BOW feature which computes the
frequency or number of times a keyword/feature appears in the document. Unlike term
Chapter 2: Literature Review 33
binary value (0 or 1), this method is more effective since it assigns more weights to
the frequent keyword than the rare ones [82]. It considers the repetition of a keyword
(𝑖𝑖) in the document 𝑗𝑗 [83] thus:
𝑡𝑡𝑖𝑖𝑖𝑖 = 𝑡𝑡𝑖𝑖𝑖𝑖 = frequency of keyword 𝑖𝑖 in document 𝑗𝑗 2-15
However, by counting the frequency of each keyword, TF normally assigns a
large weight to the common keywords and provides less weight to the unique ones.
This results in a weak text discriminating power. To avoid such a shortcoming, another
factor (called, inverse document frequency) is introduced along with the term
frequency method. TF-IDF measures the relative frequency of the keyword in a
specific document thorough an inverse proportion of that keyword over the entire text
documents. So for any keyword 𝑡𝑡𝑖𝑖, 𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡 in a document can be calculated as shown
by Eq. 2-16 [82, 84]:
𝑤𝑤𝑖𝑖𝑖𝑖 = 𝑡𝑡𝑡𝑡𝑖𝑖𝑖𝑖 × 𝑙𝑙𝑙𝑙𝑙𝑙 �
𝑁𝑁𝑑𝑑𝑡𝑡(𝑡𝑡𝑖𝑖)
� 2-16
where, 𝑡𝑡𝑡𝑡𝑖𝑖𝑖𝑖 is the frequency of the keyword 𝑡𝑡𝑖𝑖 in the document, 𝑑𝑑𝑡𝑡(𝑡𝑡𝑖𝑖) is the number
of documents containing 𝑡𝑡𝑖𝑖 and 𝑁𝑁 is the total number of documents in the entire text
corpus.
2.10.3 Chi-square (CS) Statistic
The CS statistic actually measures the lack of independence between feature 𝑡𝑡𝑖𝑖
and category 𝑐𝑐𝑖𝑖 in a training document and the critical values of this statistic can be
found using the χ2 distribution with one degree of freedom to judge extremeness [76].
The statistic is defined as [75]:
𝜒𝜒2(𝑡𝑡𝑖𝑖, 𝑐𝑐𝑖𝑖) =𝑁𝑁 × �𝑎𝑎𝑖𝑖𝑖𝑖𝑑𝑑𝑖𝑖𝑖𝑖 − 𝑏𝑏𝑖𝑖𝑖𝑖𝑐𝑐𝑖𝑖𝑖𝑖�
2
�𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖��𝑎𝑎𝑖𝑖𝑖𝑖+𝑐𝑐𝑖𝑖𝑖𝑖��𝑏𝑏𝑖𝑖𝑖𝑖 + 𝑑𝑑𝑖𝑖𝑖𝑖��𝑐𝑐𝑖𝑖𝑖𝑖 + 𝑑𝑑𝑖𝑖𝑖𝑖� 2-17
where 𝑁𝑁 is the total number of training documents and the sum of 𝑎𝑎𝑖𝑖𝑖𝑖 , 𝑏𝑏𝑖𝑖𝑖𝑖, 𝑐𝑐𝑖𝑖𝑖𝑖, 𝑑𝑑𝑖𝑖𝑖𝑖; 𝑎𝑎𝑖𝑖𝑖𝑖
is the frequency at which feature 𝑡𝑡𝑖𝑖 and category 𝑐𝑐𝑖𝑖 co-occur: 𝑏𝑏𝑖𝑖𝑖𝑖 is the frequency at
which the feature 𝑡𝑡𝑖𝑖 occurs which does not belong to category 𝑐𝑐𝑖𝑖; 𝑐𝑐𝑖𝑖𝑖𝑖 is the frequency
that the category 𝑐𝑐𝑖𝑖 occurs when it does not contain feature 𝑡𝑡𝑖𝑖 and 𝑑𝑑𝑖𝑖𝑖𝑖 is the number of
34 Chapter 2: Literature Review
times neither 𝑐𝑐𝑖𝑖 nor 𝑡𝑡𝑖𝑖 occurs. However, the CS statistic is not reliable for low
frequency terms [85].
2.10.4 Information Gain (IG)
IG is generally employed to get the amount of information that a feature can
offer to the classification algorithm. It measures the amount of information obtained
for category prediction by determining/recognising the presence or absence of a term
in a document. Assuming {𝑐𝑐𝑘𝑘}𝑘𝑘=1𝐶𝐶 is the set of categories in the target space, then the
IG of term ti can be formulated and shown in Eq. 2-18 [76]:
𝐼𝐼𝑁𝑁(𝑡𝑡𝑖𝑖) = −�𝑝𝑝(𝑐𝑐𝑘𝑘)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶
𝑘𝑘=1
𝑝𝑝(𝑐𝑐𝑘𝑘) + 𝑝𝑝(𝑡𝑡𝑖𝑖)�𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶
𝑘𝑘=1
𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖)
+ 𝑝𝑝(𝑡𝑡�̅�𝑖)�𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶
𝑘𝑘=1
𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖) 2-18
where 𝑝𝑝(𝑐𝑐𝑘𝑘) = 𝑎𝑎𝑖𝑖𝑖𝑖+𝑐𝑐𝑖𝑖𝑖𝑖𝑁𝑁
; 𝑝𝑝(𝑡𝑡𝑖𝑖) = 𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖𝑁𝑁
; 𝑝𝑝(𝑡𝑡�̅�𝑖) = 𝑐𝑐𝑖𝑖𝑖𝑖+𝑑𝑑𝑖𝑖𝑖𝑖𝑁𝑁
; 𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖) = 𝑎𝑎𝑖𝑖𝑖𝑖𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖
; 𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖) =
𝑐𝑐𝑖𝑖𝑖𝑖𝑐𝑐𝑖𝑖𝑖𝑖+𝑑𝑑𝑖𝑖𝑖𝑖
The larger the value of the IG, the more informative the feature is. On the other
hand, it has a tendency to select non-discriminating terms which are common over
multiple categories [85].
2.10.5 Language Model
Language can be viewed as a stream of words. Due to syntactic and semantic
constraints, these words are not independent. A language model has been proposed to
catch this characteristic of natural language. N-Gram models are a kind of widely used
LM which assumes that the probability of word 𝑛𝑛 in a document depends on its
previous 𝑛𝑛 − 1 words. Given a word sequence 𝑊𝑊 = 𝑤𝑤1,𝑤𝑤2, … ,𝑤𝑤𝑈𝑈, the probability of
W can be calculated as:
𝑝𝑝(𝑊𝑊) = �𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤1, …𝑤𝑤𝑖𝑖−1)
𝑈𝑈
𝑖𝑖=1
2-19
Under the word dependency assumption, the only words relevant to predicting
𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤1, …𝑤𝑤𝑖𝑖−1) are the previous 𝑛𝑛 − 1 words. So Eq. 2-19 can be written as:
Chapter 2: Literature Review 35
𝑝𝑝(𝑊𝑊) = �𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1)
𝑈𝑈
𝑖𝑖=1
2-20
In particular, an N-Gram model uses the previous 𝑛𝑛 − 1 words to predict the
next one by following Eq. 2-21:
𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) =
𝑝𝑝(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖)𝑝𝑝(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖−1)
2-21
𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) can be calculated from a text corpus using maximum
likelihood as shown by Eq. 2-22:
𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) =
𝑐𝑐(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖)𝑐𝑐(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖−1)
2-22
where 𝑐𝑐(⋅)denotes the number of occurrences. Generally, N-Gram means “sequence
of length 𝑛𝑛”. In this respect, one word sequence is called a Uni-Gram, a sequence of
two words (i.e., “repair leak”) is called a Bi-Gram, a three word sequence (i.e., “repair
pf leak”) is called a Tri-gram and so on. Although N-Gram was first mentioned by
Shannon in 1948 in the communication theory context, since then it has found
significant research attention in many applications, including text classifications.
Tripathy and associates in [86] attempted to classify movie reviews using the N-
Gram approach and found better accuracy if the N-Gram increases from Uni-Gram to
Bi-Gram. By analysing the efficiency of N-Gram on a social media dataset, Ogada and
colleagues found that a Bi-gram approach has a better accuracy compared to a Uni-
Gram or Tri-Gram one [87]. Interestingly, the accuracy of a mixed gram approach (i.e.,
combination of Uni-Gram, Bi-Gram and Tri-Gram) outperformed a classic N-Gram
one (i.e., Uni-Gram, Bi-Gram or Tri-Gram used exclusively) in many cases [86]. Pang
and Lee [88] found both Uni-Gram and a mix of Uni-Gram and Bi-Gram approaches
effective in sentiment analysis.
Saleh and associates in [89] conducted experiments with different N-Gram
features on three different corpora. The results of the N-Gram schemes suggested that
a Tri-Gram model outperformed Uni-Gram and Bi-Gram models in both 3-fold and
10-fold cross validation. However, in N-Gram the number of model parameters grow
exponentially with N and thus patterns tended to be more sparsely distributed across
the documents, which ultimately adversely affected the classification performance
[74].
36 Chapter 2: Literature Review
2.11 TEXT CLASSIFICATION ALGORITHMS
Text classification (TC) is the task of automatically sorting a set of documents
into categories from a predefined set. Machine learning techniques are commonly
applied to construct a classification model from training documents with known class
labels. The constructed model can then be used to classify new documents. The
booming interest in TC the last decades is due to [87]:
• Increased availability of documents in digital formats
• Considerable savings in terms of expert labour since no intervention is
required from either a knowledge engineer nor domain experts
The main purpose of TC is to train a classifier which performs the category
assignments automatically. These techniques have been used in many applications
including email filtering [75, 90], topic categorisation [90], document indexing and
clustering [91]. Many classification algorithms are proposed for TC, e.g., Naïve Bayes
(NB) [75, 90, 92], support vector machine (SVM) [85, 91, 93-95], k-nearest neighbour
(k-NN) [85, 94], decision trees and artificial neural network (ANN) [95].
Joachims [94] showed that SVM scales well, has a good performance on large
datasets and outperforms NB and k-NN substantially. Nevertheless, with efficient data
pre-processing, a k-NN algorithm was found to be able to achieve good classification
performance and scaled up well with the number of documents used for the
investigation [96]. In similar research, Basu and associates compared SVM with ANN.
Their results showed better performance of SVM in a reduced feature set [95]. Ozgur
and colleagues [97] investigated spam filtering using ANN and NB on Turkish
messages. For a small feature set, the binary representation produced better
performance for the NB classifier.
Yu and Xu [98] compared the performances of four classification algorithms:
NB, NN, SVM and relevance vector machine (RVM), considering different training
and feature sizes of spam filtering corpora. Experimental results showed that SVM and
RVM outperformed NB while NN was not suitable for such classification. Using k-
NN, SVM and NB algorithms applied to different parts of the email (i.e., header,
subject and body) for classifications purpose, Lai [99] concluded that NB and SVM
yielded better performances than k-NN. However, noisy and non-informative body
features caused poor performance (compared to SVM) for the NB classifier.
Chapter 2: Literature Review 37
Importantly, SVM with a TF-IDF approach outperformed all other techniques and this
was suggested to be the best classifier and feature combination for spam filtering and
email classification.
Webb and colleagues in [100] used a large-scale corpus (assembled by the
authors with more than 1 million messages) to evaluate four spam filters: SpamProbe,
SVM, regression-based boosting and NB. Their experiment verified that SpamProbe,
SVM, regression-based boosting performed similarly, followed by NB. In another
experiment on spam filtering, Zhang and associates in [78] found that SVM, AdaBoost
and logistic regression (LR) attained the best performance while NB and the lazy
learning approach (for example, k-NN) were not feasible in a cost-sensitive scenario.
In general, SVM and boosting found to be slower in a large training dataset but faster
in classifying the new one [101].
Pang and Lee [88] classified a polarity dataset using machine learning
algorithms (i.e., NB, SVM and ME) and N-Gram features (i.e., Uni-Gram, Bi-Gram
and both). Their findings implied that SVM worked well when using Uni-Gram and
both Uni-Gram and Bi-Gram together. Yadav [102] reviewed various classification
techniques and identified that in most cases, SVM performed well over NB. However,
in a small feature set, NB might be effective in some applications.
Recently, Tripathy and associates in [86] examined the accuracies of four
machine learning algorithms: SVM, NB, ME and stochastic gradient descent (SGD)
on human sentiment classifications and found SVM to be the most effective.
Conducting experiments on a multi-class sentiment analysis problem on three public
datasets, Liu and colleagues [79] achieved the best classification accuracy using an
SVM classifier. Moreover, several other investigations [103-106] on classification
performances revealed that SVM is the best classification algorithm and performs well
in the high feature space.
Liu and associates in [107] developed a conditional random field (CRF) based
information extraction approach to semantically model the labelled data dependency.
The experimental results indicated an improved performance over baseline models. In
statistical machine translation problem, Lavergne and colleagues [108] proposed a new
approach by adapting a sequence labelling tasks though CRF. However, CRF
algorithm is more effective for information extraction.
38 Chapter 2: Literature Review
Considering the usability of the text classifier for maintenance data and its
effectiveness on classification performance and computational speed, the TC
algorithm can be classified into two main approaches: discriminative methods (i.e.,
SVM and logistic regression) and probabilistic methods (NB and maximum entropy)
(see Figure 2-10).
Figure 2-10. Commonly used classification algorithms for text classification
Among these, five most common classification algorithms: NB, ME, CRF, k-
NN and SVM, are discussed in more detail in the following sections.
Chapter 2: Literature Review 39
2.11.1 Naïve Bayes
A Naïve Bayes (NB) classifier is used to find the joint probabilities of words and
classes within a set of free text. The probability of a class A for a given text field B
can be calculated by using Bayes’ law:
𝑃𝑃(𝐴𝐴|𝐵𝐵) =
𝑃𝑃(𝐵𝐵|𝐴𝐴)𝑃𝑃(𝐴𝐴)𝑃𝑃(𝐵𝐵) =
𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)𝑃𝑃(𝐵𝐵)
2-23
Since 𝑃𝑃(𝐵𝐵) is constant for all classes, only the other variables need to be
maximised. It is assumed that classes are independent of each other (the naïve
assumption). The classification task is done by considering prior probability
information and the likelihood of the incoming information to form a posterior
probability model of classification. The NB model is effectively applied for [90]:
• Text classification such as email filtering, topic categorisation etc.
• Problems in which the information from numerous attributes should be
considered simultaneously in order to estimate the probability of an
outcome
The NB classifier is typically trained on data with categorical features. A sparse
matrix indicates the frequency of the appearance of each keyword in the bag of words
for each class in the training data. So when any new text comes, the classifier trained
on the NB is used to derive the class probability of new text given the text which
appears in the training data.
2.11.2 Maximum Entropy
Maximum entropy (ME) is a general purpose model mostly applied to text
processing tasks such as text classification. Given a training sample 𝒯𝒯 =
{(𝑑𝑑1, 𝑐𝑐1), (𝑑𝑑2, 𝑐𝑐2), … (𝑑𝑑𝑁𝑁 , 𝑐𝑐𝑁𝑁)}, the maximum entropy value in terms of exponential
function form can be calculated as shown in Eq. 2-24 [78, 86]:
𝑃𝑃(𝑐𝑐|𝑑𝑑) =
1𝑍𝑍(𝑑𝑑) 𝑒𝑒𝑥𝑥𝑝𝑝 ��𝜆𝜆𝑖𝑖𝑡𝑡𝑖𝑖(𝑑𝑑, 𝑐𝑐)
𝑛𝑛
𝑖𝑖=1
� 2-24
where 𝑑𝑑𝑖𝑖 is the feature vector, 𝑐𝑐𝑖𝑖 is the target class, 𝑃𝑃(𝑐𝑐|𝑑𝑑) is the probability of
document 𝑑𝑑𝑖𝑖 belonging to the class 𝑐𝑐𝑖𝑖, 𝑡𝑡𝑖𝑖(𝑑𝑑, 𝑐𝑐) is the feature/class function for feature
𝑡𝑡𝑖𝑖 and class 𝑐𝑐, 𝜆𝜆𝑖𝑖 is the estimated parameter and 𝑍𝑍(𝑑𝑑) is the normalizing factor.
40 Chapter 2: Literature Review
2.11.3 Conditional Random Fields
It is widely used for sequence labelling, especially effective for information
extraction application. The model is generally useful in,
• Representing the natural language dependency
• Preventing label bias issue
Given an input sequence 𝑥𝑥 ∈ 𝒳𝒳𝑛𝑛 , label sequence 𝑐𝑐 ∈ 𝒞𝒞𝑛𝑛 , the first order
conditional probability of the label can be determined by Eq. 2-25 [108, 109]:
𝑝𝑝(𝑐𝑐|𝑥𝑥;𝜃𝜃) =
1𝑍𝑍𝑥𝑥
exp���𝜃𝜃𝑖𝑖𝑡𝑡𝑖𝑖(𝑐𝑐𝑖𝑖, 𝑐𝑐𝑖𝑖+1, 𝑥𝑥, 𝑖𝑖)𝑖𝑖𝑖𝑖
� 2-25
here, 𝑡𝑡𝑖𝑖 is the feature function and 𝑍𝑍𝑥𝑥 is partition function. The output sequence is
usually determined using Viterbi and forward-backward variants [110]. Using the
estimated training data distribution, the log likelihood of the training data can be
determined [109]:
ℒ(𝜃𝜃) = 𝐸𝐸𝑝𝑝�(𝑥𝑥,𝑐𝑐)[log 𝑝𝑝(𝑐𝑐|𝑥𝑥;𝜃𝜃)] 2-26
2.11.4 K-Nearest Neighbour
This is an instant based learning algorithm, used to classify data by employing
distance measures [79]. Euclidean distance is typically used in k-NN to compute the
distance between the text data [83] and the distance function can be formulated by Eq.
2-27 [70]:
𝐷𝐷(𝑎𝑎, 𝑏𝑏) = ���𝑎𝑎𝑖𝑖 − 𝑏𝑏𝑖𝑖�
2𝑘𝑘
𝑖𝑖=1
2-27
where 𝑎𝑎𝑖𝑖 and 𝑏𝑏𝑖𝑖 are the two keywords in Euclidean space of a text data. In a training
set, k closest samples (among text data) are considered along with their categories [70].
In the classification phase, distances among new text samples with all stored training
set data are measured and k closest samples are chosen by Eqs. 2-28 and 2-29 [79,
111]:
Chapter 2: Literature Review 41
arg max𝑖𝑖� sim�𝐷𝐷𝑖𝑖|𝐷𝐷� ∗ 𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖�
𝑘𝑘
𝑖𝑖=1
2-28
𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖� = �
1, if the class of text 𝐷𝐷𝑖𝑖 is 𝑖𝑖0, if the class of text 𝐷𝐷𝑖𝑖 is not 𝑖𝑖
2-29
where 𝐷𝐷𝑖𝑖 is the similarity distance between 𝑗𝑗 = 1, 2, . . ,𝑘𝑘 keywords in the text data,
sim�𝐷𝐷𝑖𝑖|𝐷𝐷� is the similarity between text 𝐷𝐷𝑖𝑖 and training text 𝐷𝐷 and 𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖� is the
indicator of the training text 𝐷𝐷𝑖𝑖 with respect to class 𝑖𝑖.
The k-NN method is particularly effective in [111],
• Non-parametric features
• Classification tasks with multi-categorised documents.
However, when the size of the training set increases, k-NN causes major
computational time. Moreover, the performance of the classifier built from k-NN is
often affected by irrelevant and noisy features present in text documents.
2.11.5 Support Vector Machine
SVM classifiers were originally developed by Cortes and Vapnik [93] and
applied to spam filtering. SVM acknowledges some key properties that are prevalent
in text [94]:
• High dimensional feature spaces
• Few irrelevant features (dense concept vector), and
• Sparse instance vectors
An SVM seeks a hyperplane in a dataset that best separates the classes of the
data. The aim of SVM is to orient this hyperplane in such a way as to be as far as
possible from the closest members of all classes (i.e. to maximise the margin). Suppose
a training dataset contains d-dimensional features 𝐱𝐱𝑖𝑖 ∈ 𝑅𝑅𝑑𝑑 with class labels 𝑐𝑐𝑖𝑖 ∈
{−1,1}. A binary classifier 𝑡𝑡(𝐱𝐱) can be expressed as Eq. 2-30,
� 𝑡𝑡
(𝒙𝒙𝑖𝑖) ≥ 0 for 𝑐𝑐𝑖𝑖 = +1𝑡𝑡(𝒙𝒙𝑖𝑖) < 0 for 𝑐𝑐𝑖𝑖 = −1 2-30
42 Chapter 2: Literature Review
In a traditional SVM, this function is parameterized by a hyperplane, 𝑡𝑡(𝐱𝐱) = 𝒘𝒘 ∙
𝐱𝐱 + 𝑏𝑏 and the class labels can be predicted according to the feature vector as shown in
Eqs. 2-31 to 2-33:
𝐱𝐱𝑖𝑖 ∙ 𝒘𝒘 + 𝑏𝑏 ≥ +1 − 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = +1 2-31
𝐱𝐱𝑖𝑖 ∙ 𝒘𝒘 + 𝑏𝑏 < −1 + 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = −1 2-32
𝜉𝜉𝑖𝑖 ≥ 0 ∀𝑖𝑖 2-33
where 𝒘𝒘 is called the weight coefficient vector, 𝑏𝑏 is the bias of the hyperplane and 𝜉𝜉𝑖𝑖
is a positive slack variable to allow for (some) misclassifications. The combined form
can be formulated and shown by Eq. 2-34:
𝑐𝑐𝑖𝑖(𝐱𝐱 ∙ 𝒘𝒘 + 𝑏𝑏) ≥ 1 − 𝜉𝜉𝑖𝑖 where 𝜉𝜉𝑖𝑖 ≥ 0 ∀𝑖𝑖 2-34
The hyperplane is however able to produce only plane boundaries, which are
often insufficient in non-linear situations. However, choosing an appropriate non-
linear transformation (i.e. the “kernel trick”), it is possible to use the same approach to
define more general boundaries. This kernel nonlinearly maps samples into a higher
dimensional space [112] and, unlike the linear kernel, it can handle the case when the
relation between class labels and attributes is nonlinear.
𝜙𝜙(𝐱𝐱𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏 ≥ 1 − 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = +1 2-35
𝜙𝜙(𝐱𝐱𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏 < −1 + 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = −1 2-36
The combined form can be termed as decision valued (𝐱𝐱) in Eq. 2-37:
𝑡𝑡(𝐱𝐱) = 𝜙𝜙(𝐱𝐱).𝒘𝒘 + 𝑏𝑏 2-37
According to the definition of SVM [93, 113], the optimal value of 𝒘𝒘 and b can
be found using the optimization problem in Eq. 2-38:
𝑚𝑚𝑖𝑖𝑛𝑛12
‖𝒘𝒘‖2 + 𝐶𝐶�𝜉𝜉𝑖𝑖
𝑁𝑁
𝑖𝑖=1
s. t. 𝑐𝑐𝑖𝑖(𝜙𝜙(𝒙𝒙𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏) ≥ 1 − 𝜉𝜉𝑖𝑖 ∀𝑖𝑖 2-38
Chapter 2: Literature Review 43
where C is a regularization parameter that penalizes misclassifications. In this
formulation, a minimum of ‖𝒘𝒘‖ corresponds to the maximum distance between the
boundary and the training points of the two classes. The optimisation problem can be
simplified using the Lagrangian function while the optimal value of 𝒘𝒘 can be shown
by Eq. 2-39,
𝒘𝒘 = �𝑐𝑐𝑖𝑖𝛼𝛼𝑖𝑖𝜙𝜙(𝐱𝐱𝒊𝒊)
𝑁𝑁
𝑖𝑖=1
2-39
where 𝛼𝛼𝑖𝑖 are the Lagrange Multipliers. By combining Eqs. 2-37 and 2-39 and defining
the kernel function 𝐾𝐾�𝐱𝐱𝒊𝒊, 𝐱𝐱𝒋𝒋� = 𝜙𝜙𝑇𝑇(𝐱𝐱𝒊𝒊)𝜙𝜙(𝐱𝐱𝒋𝒋) , the kernel based SVM can be
formulated as shown by Eq. 2-40:
𝑡𝑡(𝐱𝐱) = �𝛼𝛼𝑖𝑖𝑐𝑐𝑖𝑖𝐾𝐾(𝐱𝐱, 𝐱𝐱𝒊𝒊)
𝑁𝑁
𝑖𝑖=1
+ 𝑏𝑏 2-40
The most commonly used kernel functions are the linear, polynomial and radial
basis function (RBF) kernels. In this research, the well-known RBF kernel has been
used [8],
𝐾𝐾�𝐱𝐱𝒊𝒊, 𝐱𝐱𝒋𝒋� = exp (−1
2𝜎𝜎2�𝐱𝐱𝒊𝒊 − 𝐱𝐱𝒋𝒋�) 2-41
In general, it is widely agreed in classification literature that the RBF kernel is a
reasonable first choice. It allows a trade-off between the complexity of decision
boundaries and the generalization capabilities [114] of the classifier, which are tuned
to the cross-validation of γ and c.
2.12 PERFORMANCE EVALUATION
To obtain an accurate assessment and performance for a prediction classifier, it
must be tested on data that was not used for training. Machine learning algorithms are
typically evaluated by accuracy, recall and precision. Different performance
evaluation metrics as well as how the metrics can be calculated from the outcomes of
machine learning models have been shown in Table 2-5. TP, FP, FN and TN are the
counts of the documents.
44 Chapter 2: Literature Review
Table 2-5. Performance evaluation metrics
Prediction Actual
In the Class Not in the Class In the Class True Positive (TP) False Positive (FP) Not in the Class False Negative (FN) True Negative (TN)
According to the definition in [86], “precision” is the ratio of the number of
documents correctly labelled as positive to the total number of positively classified
documents and “recall” is the ratio of the total number of positively labelled documents
to the total number of documents that are truly positives. “Accuracy” can be calculated
as the ratio of correctly classified documents to the total number of documents. F-
Measure is the harmonic mean of precision and recall. Such performance measures
can be formulated and shown by Eqs. 2-42 to 2-45:
Recall =
𝑇𝑇𝑃𝑃𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑁𝑁
2-42
Precision =
𝑇𝑇𝑃𝑃𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑃𝑃
2-43
Accuracy =
𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑁𝑁𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑁𝑁 + 𝐹𝐹𝑃𝑃 + 𝑇𝑇𝑁𝑁
2-44
F − Measure =2 × Precision × Recall
Precision + Recall 2-45
2.13 SUPERVISED MACHINE LEARNING
Supervised machine learning techniques typically label new events, based on the
given labelled examples. Such learning algorithms deduce event properties and
characteristics from training data, and use these to generalize to unseen situations. To
classify textual data into predefined classes it is necessary to partition a labelled set of
training data into different classes to test the performances of the classifiers. A
categorical attribute is set as a class attribute or target variable. The given data is
therefore first divided into pre-defined classes to interpret the terms defined in the
textual databases. A framework for the supervised machine learning text classification
method is presented in Figure 2-11. It can be seen that the supervised process has two
Chapter 2: Literature Review 45
steps: training and prediction. In the first process, text data is usually pre-processed
through a text cleaning method and then converted into vector representation through
different features. The text cleaning method is usually applied to reduce the number of
features while features extraction methods convert the text data into vectors where
each vector represents a keyword in the text data (details of these processes have been
discussed in Section 2.10).
Figure 2-11. A framework for supervised machine learning text classification
After that, a text classifier is constructed using different text classification
algorithms, which have been elaborated on in Section 2.11. However, the most
important information required to construct a classifier is the “labelled text data”.
Supervised machine learning methods largely depend on such information. Finally, the
constructed classifier can be applied to the unseen new text data to predict their classes
(as shown in Figure 2-11).
Most organisations use maintenance logs or work orders to keep records of all
maintenance and repair activities performed on the machine or asset. Among different
data fields, the most significant and consistently filled in information is maintenance
work descriptions. A few studies have analysed such free text descriptions in industry
maintenance logs and/or work orders. Devaney and associates in [4] proposed
46 Chapter 2: Literature Review
analysing the free texts from maintenance logs using domain ontology. The authors
allowed a system to learn over a large set of unlabelled data using a small subset which
was manually labelled and then the learned categories were used to construct a case
library database of maintenance patterns. These patterns were used by a CBR engine
to predict future failures as well as allowing more efficient troubleshooting and
diagnosis in the event of a failure. While the authors proposed an analysis framework
based on the construction of a case library, no case study was presented on real world
data.
Edwards and colleagues [7] categorised maintenance logs using a clustering
algorithm on a small data subset, manually labelling the data as failure or non-failure,
based on an expert opinion. Sipos and colleagues used operational logs and component
replacement data (and assumed that each replacement constituted a failure) to
construct a classifier that can anticipate the imminent failure of the equipment [6].
Developing failure and maintenance data extraction methodology, Alkali and
associates in [58] utilized hourly readings of motor current to determine whether the
mills were running or not and assumed all downtime was related to failure. An
exploratory data analysis was conducted using mill up time and downtime. However,
their analysis lacked any distinction between planned and unplanned downtime,
although the latter is the most relevant to downtime due to failure events.
A few attempts have been made to analyse work orders that have been generated
for condition based maintenance policy. For instance, Moreira and Junior [8] proposed
a method of performing prognostics on aircraft components based on an SVM
classification algorithm. Flight data and maintenance logs were used to classify the
training data into healthy and unhealthy states. The degradation index was finally
created from the classification result to prepare a future schedule of aircraft
maintenance. Bastos and associates [45] developed statistical data extraction methods
to extract failure-related information from their chosen datasets: equipment condition
monitoring data and maintenance data (containing both corrective and preventive
maintenance). The prediction model was able to forecast future failure based on the
existing maintenance records and also to estimate the possibility of machine
breakdown. Prytz and colleagues in [3] proposed a data-driven method of predicting
future failures of the air compressor of a commercial vehicle. The method was derived
from available warranty and vehicle maintenance log data and combined pattern
Chapter 2: Literature Review 47
recognition with a remaining useful life (RUL) prediction to estimate the vehicle repair
work.
Although these text classification methods have been successfully used for
relevant purposes relevant to this study, in real industrial cases, there is always a
scarcity of labelled data to train such useful classification methods. A supervised
learning method uses only labelled training data, however many semi-supervised
learning methods employ a large amount of unlabelled data along with some labelled
ones to train classifiers and induce a better performance. [115]. This will now be
discussed in more detail as it is highly relevant to the real life industrial contexts of the
current study.
2.14 SEMI-SUPERVISED MACHINE LEARNING
Text classification methods typically employ supervised learning approaches
and so are reliant on the quality of the labelled historic data used to train them. Though
labelled data are expensive to obtain because it often involves human annotators, a
large number of unlabelled data are easy to get [115]. In such cases, the well-known
technique of semi-supervised learning (SSL) can make use of both labelled and
unlabelled data to learn (i.e. train) a classifier efficiently [116]. Popular SSL methods
include: active learning [117, 118], self-training [118, 119], co-training [116],
transductive support vector machine (TSVM) and graph-based method [120]. In order
to reduce the manual work load, two SSL methods are widely used: active learning
tries to overcome the labelling bottleneck by asking queries to the oracle [121] while
self-training aims to label samples by the classifier itself. The details of active learning
and semi-supervised self-training will be discussed in the following sub-sections.
2.14.1 Active Learning
Some studies have made use of such unlabelled data to improve text
categorisation through active learning (AL) [119]. AL is an iterative machine learning
process that can be used to build a text classifier by selecting from a set of larger
unlabelled data only the most informative samples from labelling by an expert [117],
which can be particularly advantageous when labelling is computationally expensive.
AL attempts to select informative samples to maximise the accuracy of the text
classifier with less training data. When the training data is available, through a process
of having selected random samples and training the classifier on these, the learning is
48 Chapter 2: Literature Review
called passive learning (see Figure 2-12). In passive learning, the classifier is thus built
using the given training data in advance. However, acquiring training data in advance
is difficult and time consuming. In this regard, AL is effective since it queries only the
informative samples and constructs the classifier accordingly.
Figure 2-12. General schema for passive and active learning [122]
Several studies show that AL greatly reduces the labelling efforts in various
applications, including text categorisation [116-119], sentiment analysis [123], image
recognition [124, 125], and high-dimensional boundary identification [126]. In an
active learning framework, the active learner is initially trained with a small amount
of labelled training data and with the access to a large set of unlabelled data. Using the
trained model, new informative samples from the pool of unlabelled data are selected
for labelling. The selected data samples are subsequently added to the labelled training
data and the learner is re-trained. This iterative process is repeated until the stopping
criterion has been met [127].
Thus, the major considerations in AL are to choose the selection strategy by
which the learner may be able to ask the queries, which includes membership query
synthesis, stream-based and pool-based AL [121]. Figure 2-13 illustrates these three
different selection strategies in active learning. Membership query synthesis requests
the labels of unlabelled samples in the input space that the learner generates de novo
or it starts from the beginning.
Chapter 2: Literature Review 49
Figure 2-13. Three main active learning query selection strategies [121]
In stream-based AL, one unlabelled sample is considered at a time and the
learner decides whether or not to query its label and send this query to the oracle. On
the other hand, pool-based AL considers a large unlabelled pool of samples, ranks
them based on a selection criterion and selects a number of the best samples. Stream-
based AL makes the query decision by individually processing every datum while
pool-based AL evaluates and ranks the entire unlabelled dataset before selecting the
best query.
In all cases, one must select the most informative unlabelled sample/s. There
have been many proposed ways of formulating such query strategies in the literature,
including uncertainty sampling, query by committee, expected model change,
expected error reduction, variance reduction, and density weighted method [121, 128,
129].
The most common of these is uncertainty sampling, which was first introduced
by Lewis and Gale [130]. The key idea is that the samples that the text classifier is
most uncertain about provide the greatest insight into the underlying data distribution
and should be selected for labelling. In theory, AL is possible with any classifier that
is capable of passive learning. Since SVM is proven to provide highly accurate results
in the passive learning scenario (see Section 2.11.5), this will be utilized as a current
classifier for active learning. Given that, labelled training data �𝐱𝐱i,𝑦𝑦𝑖𝑖� and the centre
of the largest hypersphere 𝑤𝑤𝑖𝑖 which can fit inside the current version space 𝛾𝛾𝑖𝑖 , the
position of 𝑤𝑤𝑖𝑖 clearly depends on the shape of the region of 𝛾𝛾𝑖𝑖 . Now, for each
50 Chapter 2: Literature Review
unlabelled example 𝐱𝐱� can be tested to see how close their corresponding hyperplanes
are positioned with the centrally placed 𝑤𝑤𝑖𝑖. The closer a hyperplane to the point 𝑤𝑤𝑖𝑖,
the more centrally is placed in the version space. Following Eq. 2-37, the shortest
distance between hyperplanes for each unlabelled example and the vector 𝑤𝑤𝑖𝑖 is simply
the distance between feature vector 𝜙𝜙(𝐱𝐱) and the hyperplane 𝑤𝑤𝑖𝑖. Therefore, we want
to query the example 𝐱𝐱� that induces a hyperplane as close to 𝑤𝑤𝑖𝑖 as possible [128]:
𝐱𝐱� = argmin|𝜙𝜙(𝐱𝐱).𝑤𝑤| = argmin|𝑡𝑡(𝐱𝐱)| 2-46
This strategy is called the simple margin which queries the sample closest to the
current decision boundary (see Figure 2-14). The circle represents the largest radius
hypersphere that can fit in the version space. The white area in the version space which
is bounded by solid lines corresponds to labelled training data while five dotted lines
(instances “a”, “b”, “c”, “d” and “e”) represent unlabelled data from the pool. Now,
according to simple margin theory, the instance “b” is closest to the SVM 𝑤𝑤𝑖𝑖 and so
we choose to query “b”.
Figure 2-14. Uncertainty-based active learning that queries “b”
Another theory-motivated query selection strategy is the query by committee, in
which a committee is formed where each committee member is allowed to vote on the
queries. This strategy develops a method to evaluate the disagreement among the
committee members and thus chooses the most informative sample [131]. To estimate
the disagreement, two approaches are widely used: vote entropy (see Eq. 2-47) and
Kullback-Leibler (KL) divergence (Eqs. 2-48 and 2-49) [121]:
𝑥𝑥𝑉𝑉𝑉𝑉∗ = argmax −�
𝑉𝑉(𝑦𝑦𝑖𝑖)𝐶𝐶
𝑙𝑙𝑙𝑙𝑙𝑙𝑉𝑉(𝑦𝑦𝑖𝑖)𝐶𝐶
𝑖𝑖
2-47
Chapter 2: Literature Review 51
where 𝑦𝑦𝑖𝑖 is the number of samples selected for labelling, 𝑉𝑉(𝑦𝑦𝑖𝑖) is the number of votes
the sample receives from the committee members and 𝐶𝐶 is the committee size.
𝑥𝑥𝐾𝐾𝐾𝐾∗ = argmax
1𝐶𝐶�𝐷𝐷�𝑃𝑃𝜃𝜃(𝑐𝑐) ⃦𝑃𝑃𝐶𝐶�𝐶𝐶
𝑐𝑐=1
2-48
𝐷𝐷�𝑃𝑃𝜃𝜃(𝑐𝑐) ⃦𝑃𝑃𝐶𝐶� = �𝑃𝑃𝜃𝜃(𝑐𝑐)(𝑦𝑦𝑖𝑖|𝑥𝑥)𝑙𝑙𝑙𝑙𝑙𝑙
𝑃𝑃𝜃𝜃(𝑐𝑐)(𝑦𝑦𝑖𝑖|𝑥𝑥)𝑃𝑃𝐶𝐶(𝑦𝑦𝑖𝑖|𝑥𝑥)
𝑖𝑖
2-49
Expected error reduction, which is also called the decision-theoretic approach,
estimates the future error by training ℒ ∪ ⟨𝑥𝑥, 𝑦𝑦⟩ on the remaining unlabelled data 𝒰𝒰
and querying the minimum expected error by the following Eq. 2-50:
𝑥𝑥0/1∗ = argmin�𝑃𝑃𝜃𝜃(𝑦𝑦𝑖𝑖|𝑥𝑥)�� 1 − 𝑃𝑃𝜃𝜃+⟨𝑥𝑥,𝑦𝑦⟩�𝑦𝑦�|𝑥𝑥(𝑢𝑢)�
𝑈𝑈
𝑢𝑢=1
�𝑖𝑖
2-50
where, 𝑃𝑃𝜃𝜃+⟨𝑥𝑥,𝑦𝑦⟩ is the new retrained model containing ℒ ∪ ⟨𝑥𝑥,𝑦𝑦⟩ samples.
Active learning first garnered serious research attention in the 1980’s [132] and
since then it has remained a vibrant research area. Moon and associates [133] proposed
a new AL algorithm based on a cost driven decision framework where the learner
chooses to query either the labels or the missing attributes from the unlabelled data
points. Considering the common and domain specific features, Li and associates in
[134] proposed a novel multi-domain active learning framework which queried
information duplicated across domains, then further converting such information to
form part of the model loss reduction. Their method reduced the human labelling
efforts by 33.2%, 42.9% and 68.7% on sentiment classification, newsgroup
classification and email spam filtering respectively. Novak and colleagues [135]
presented a comparison between two most common query selection strategies: simple
margin and error reduction sampling, evaluating their performances on a range of
categories from the Reuters Corpus. Simple margin sampling performed better on a
large news article than error reduction. Moreover, the active learning method proved
to be more efficient by requiring only half of the samples required by to passive
learning for the same outcome.
52 Chapter 2: Literature Review
In theory, AL is possible with any classifier that is capable of passive learning.
However, over the years, SVM has proven to be particularly effective, especially in
text classification [128, 136] and can easily identify uncertain samples as those that
are closest to the hyperplane. Goudjil and associates in [137] proposed a novel active
learning method based on the posterior probability estimation within SVM classifiers.
Applying the method on three well-known real world datasets: R8, 20ng and WebKB,
their experiments demonstrated that the method significantly reduced the labelling
efforts while simultaneously increasing classification accuracy. Using SVM-based
AL, Silva and Ribeiro [118] compared the effectiveness of AL with a baseline
classifier considering the deficit of labelled data.
Using naïve Bayes (NB) and k-nearest neighbour (k-NN) algorithms along with
SVM in AL, Hu and associates in [117] developed a novel exploration of the
reusability problem in text categorisation. For their experiment, they found that SVM
performed the best for text categorisation in an active learning setting. Since SVM has
been proven to be effective to select the samples closest to the hyperplane [128], the
SVM algorithm has been used in this research as a baseline classifier for active
learning (see Chapter 6).
2.14.2 Semi-Supervised Self Training
Another commonly used SSL algorithm is self-training. In self-training, a
classifier is first trained with the small number of labelled samples, and then it is used
to classify the unlabelled ones. The most confident unlabelled samples, together with
their predicted labels, are added into the training data, and the procedure is then
repeated [138]. In semi-supervised self-training (SSST), the learner automatically
labels samples from unlabelled data and adds the most certain samples to the training
data in each learning cycle. Pavlinek and Podgorelec [139] investigated the prediction
accuracy of self-training LDA (ST-LDA) performed on multinomial NB and SVM
algorithms. The method was tested on imbalanced datasets and it was discovered that
ST-LDA with multinomial NB outperformed other methods. For tweet sentiment
classification, Silva proposed a semi-supervised learning framework using a similarity
matrix constructed from unlabelled data. The experimental results of comparing this
method with self-training, co-training and SVM implied that the similarity based
approach performed better than the others in most of the assessed datasets (Twitter).
Chapter 2: Literature Review 53
However, SSST learnings do not often provide better classifier accuracy
individually. For example, when the initial training data are very weak, many class
labels might be wrongly predicted and this could introduce incorrect data to the oracle
for manual labelling. Moreover, the samples with the highest confidence are not
necessarily the most useful ones and do not ensure higher predictive accuracy.
Conversely, if we could integrate other SSL methods with AL, it could further reduce
manual labelling without compromising the desired accuracy of the text classifier.
In this regard, Leng and associates in [119] proposed an active semi-supervised
SVM algorithm by using both active learning (to select class boundary samples) and
semi-supervised learning (to select class central samples). Experimental results
showed that their method finds the boundary samples more precisely than using AL
only. In a similar approach, Zhang and colleagues [116] combined co-training with
active learning. Their approach not only selected the most reliable instances according
to the criteria of high confidence and nearest neighbour, but also exploited the most
informative instances with human annotations to improve text classifier performance.
When using small training datasets, the combined approach of utilizing instances
related to most certain and active learning together showed better classification
performance [118]. Hajmohammadi and colleagues [123] developed a new model
combing AL with SSST for cross-lingual sentiment classification. The experimental
results showed that the model outperformed the baseline methods of active learning,
self-training, random sampling and support vector machine on three boom review
datasets.
2.15 SUMMARY AND RESEARCH GAP
This section summarises the limitations of typically collected maintenance data
and explains the lack of literature on information requirement specification for a
maintenance optimisation model. The literature review on optimisation models and
data mining methods suggests that there is a lack of discussion on identifying the
information requirement specifications for reliability and maintenance optimisation
models and methods to extract information from multiple maintenance databases. The
details are summarised below:
54 Chapter 2: Literature Review
• Lack of data and information requirement specifications for
reliability and maintenance optimisation models
There has been vast quantity of literature on modelling maintenance optimisation
based on reliability or fault diagnostic analysis. However what data is needed and how
the industry data can be extracted to meet the requirements of these models are rarely
discussed in the literature. Failure and scheduled maintenance times, covariates, cost
and down times are the information required for maintenance optimisation models.
These types of information can be buried in a number of different information systems
or databases in various raw forms in asset intensive industries.
Without requirement specification, this information cannot be used directly in
optimisation models. For example, failure time is essential data needed by reliability
prediction and maintenance decision models. In most cases, it is not readily available
from industry databases. Such information normally can be extracted from multiple
information systems such as work order (WO) and outage records and even from the
digital control system (DCS). However, an accurate definition of failure is needed as
a pre-requisite of this type of information extraction. This necessity implies that data
needs to be collected according to the information requirement specifications. In the
past, data were collected without such an objective in place. There is no systematic
research on this to date.
• Unavailability of required information in typically collected
maintenance databases
As mentioned in Section 2.7, the main reason for the unavailability of required
information for optimisation models might be that pointed out by Louit and associates
in [2] as datasets are usually collected to record maintenance activity rather than
reliability analysis. Furthermore, data collected during the maintenance process is
usually incomplete. These characteristics make the standard approaches challenging
for discovering useful information.
• Methods for extracting required information from multiple
databases of maintenance data are unavailable
Most of the methods and models (as mentioned in Sections 2.3, 2.4 and 2.5)
assume that information required for maintenance optimisation models is available or
Chapter 2: Literature Review 55
can be generated through simulation methods. Few efforts have been made to extract
this information from real industry databases in both numerical and text format.
The following chapter will provide the information requirement specification for
reliability and maintenance optimization models. Available maintenance databases
and information requirement in existing models will be investigated and the proposed
requirement specification will provide a guideline to record correct data in existing
maintenance databases. The requirement of information extraction methodology will
be discussed which enables readers to identify the thesis contribution when
approaching these problems.
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 57
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models
This chapter aims to investigate the existing maintenance databases to obtain the
information requirements specification necessary for reliability and maintenance
optimisation models. In this regard, the limitations of existing maintenance databases
will be critically discussed and the sufficiency of existing maintenance databases
(required for maintenance optimisation models) will be analysed. This investigation
serves two purposes: 1) to identify key challenges that are to be addressed in the
remainder of the thesis; and 2) to improve the industry data collection practices to meet
the requirements of the models.
3.1 ARE CURRENT MAINTENANCE DATABASES SUFFICIENT FOR MAINTENANCE OPTIMISATION?
Basic asset information are typically available in maintenance databases, for
example, asset identification number and maintenance work identification number.
However, the primary information that is required for failure time models is the
effective age of the asset and historical failure and preventive maintenance times. More
importantly, one needs to distinguish the work orders into reactive and preventive
maintenance. Manzini et al. [13] discusses this issue and advises that a failure report
and work order should be collected for proper recording of corrective and planned
maintenance. The fundamental pieces of information to be collected are the date and
time of failure, the machine and component that failed, and the characteristics of the
maintenance action performed (time to repair, spare parts if used, and workload
employed). The failure report and work order are aimed to trace the maintenance
history (whether CM or PM) separately, so that this can be linked with maintenance
decision.
However, many industrial datasets to not conform to this standard and only
record work orders. These work order data are often recorded primarily for
maintenance workflow and planning purposes, not for reliability analysis [2, 15, 54]
58 Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models
which often leads to the existing maintenance database only being used as a “work
order system” without the power of analysis and reporting [14]. Thus, typically
available maintenance data only records the maintenance work done, the work
descriptions and the work priority, but does not specify if the necessary work is causing
downtime. This ambiguity as to whether the work order is a “failure” or preventive is
thus a key issue in the analysis of typically available maintenance data.
Another possible data source are the outage/downtime data. However, like work
orders, this data is incomplete from a reliability analysis viewpoint. Outage/downtime
data contain asset stoppage time, i.e. “when”, but the information as to why it was
down is often specified only at a very high level and is not sufficiently detailed for
reliability analysis (e.g. the system that causes the issue may be specified, but not the
component). Thus, downtime data is incomplete from an analysis point of view, where
one needs to know if this downtime is unplanned (i.e. a possible indication of a failure
event) or planned (i.e. due to preventive maintenance).
The free text descriptions of the work orders and downtime data can often
provide insight into the motivation for the maintenance action, however these
descriptions can be ambiguous, difficult to interpret (e.g. sometimes the descriptions
just state the component of the system that was maintained), or laborious to analyse.
So, depending on the working definition of functional failure of the asset and the
recording practices of maintenance personnel, it may difficult to ascertain if work
order is due to reactive or preventive maintenance. Perhaps more importantly, it is very
laborious and time intensive to manually examine tons of thousands of work order free
text descriptions.
Like failure time models, degradation models require failure times and
additionally require condition indicators (as discussed in Section 2.4). Thus, condition
data is needed and must be properly aligned with work orders in time (i.e. both must
have reliable time stamps). An important difficulty is of course collecting and storing
of massive amount of high frequency condition data [17]. However, it is still necessary
to obtain failure times, even if they are simply used to infer the failure degradation
level from condition data.
Maintenance costs are required are also required for both TBM and CBM. As
discussed in Section 2.5, maintenance cost consists of direct cost, downtime cost and
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 59
product reject cost. While exact downtime and reject costs can be difficult to establish
for operationally complex manufacturing operations, they can often be approximated
from expert knowledge and/or historical production data. On the other hand, direct
maintenance costs are quite easy to acquire, since the can mostly be obtained from
work order data or accounting information.
So, it is evident from the above discussion that the basic asset information might
be available in maintenance databases while a further analysis is required to identify
failure and planned maintenance times, which are required for both failure time models
(for TBM) and degradation models (for CBM). This suggests that a reliable and
general methodology for identifying failure and preventive maintenance times from
maintenance databases is a key enabler to identifying accurate models and subsequent
optimisation of maintenance.
3.1.1 Identifying Failure and Planned Maintenance Times
The common difficulty for both failure time and degradation models is to
identify the failure and planned maintenance times since, it is ambiguous as to whether
a work order is reactive or preventive. In order to identify failure times, one needs to
know when the asset is installed and when a reactive maintenance event occurs. Basic
asset information i.e. asset identification number and functional location of the asset,
asset installation date/time, and fault characteristics is typically available in
maintenance databases. However, in many industries, asset installation date/time is
difficult to obtain. In this circumstance, first historical failure time can be used as an
alternative of the installation time (albeit with the drawback of introducing bias in the
parameter estimation). However, other alternative ways to identify the installation time
of the asset thorough maintenance data fields are:
• Free Text Descriptions: Such text descriptions are mostly available to
maintenance databases, including work orders, stop data and plan to
work (PTW). The free texts contain the keywords (for instance, major
overhaul, commissioning, installation etc.) which may include the
installation of assets.
• Maintenance Type: Maintenance work type labels such as “Preventive”/
“Overhaul” can be an alternative source of such information. In this
regard, one may consider overhauls as an effective installation, and a
60 Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models
work order may be available to indicate the time/date (depending on the
specific model).
Another important characteristic is to link between the reactive maintenance
event and failure modes. If multiple failure modes exist, they need to be identified
clearly and need to model them separately. This is particularly useful to develop failure
time or degradation models for specific failure mode/s of an asset [5, 44]. Failure
modes can be obtained by examining the work orders.
Now, the reactive maintenance times can be identified from few potential
sources. Ideally, a field similar to “functional loss” would be directly recorded and
would indicate if the asset was able to function with the mentioned defect. If such data
field exists in maintenance databases, one can easily identify the reactive maintenance
time. This loss can be classified into three categories: complete, partial and potential
loss of function [1, 140]. In actual industrial settings, two categories only tend to be
used: complete and delayed loss of function [31, 34]. However, since work order are
often used only for management purposes, this “functional loss” field is not available
in many existing maintenance databases and indicators on asset function are often
buried in the free texts of maintenance work orders / logs.
Other potential sources to identify failure and planned maintenance times from
maintenance databases are:
• Work Order Type: Data recorded in this field is normally created due
to fault indication, fault inspection, maintenance work to repair defect as
well as preventive maintenance. Depending on the type of work order,
the maintenance events can be classified as failure (related to repair
defects) and planned maintenance (related to overhaul or preventive
maintenance).
• Maintenance Type: This data field stores the type of maintenance action
applied to the asset which could range from inspection, modification and
repair of defect to preventive maintenance [44]. Maintenance types
falling within repair of defect and preventive maintenance can be
classified as failure and planned maintenance events respectively.
• Maintenance Priority: This data field is recorded to tag the urgency and
level of emergency of the maintenance work. The urgency levels are
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 61
usually divided into Level 1, immediate repair of the asset, Level 2,
urgent repair of the asset and finally Level 3, planned and scheduled
repair of the asset. Depending on the urgency, maintenance events can
be distinguished into failure (i.e., Levels 1 and 2) and planned
maintenance events (i.e., Level 3).
• Maintenance Work Descriptions (Free Text Format): Such short text
data fields may contain reliable failure and planned maintenance
information of the asset [5]. Generally, this data filed is common across
multiple maintenance databases. In work orders, such data field contains
the intended repair work needs to be performed on the asset. The detailed
work descriptions in another database (often referred to as outage, stop
time or downtime data) describe the maintenance work when the asset is
offline. Work descriptions recorded in repair work may contain some
significant keywords, for example, “leak”, “block”, “stopped” or
“overhaul”, “planned stoppage” etc. Using such distinctive keywords,
maintenance events can be classified into two: corrective maintenance
due to failure or preventive maintenance. Moreover, work descriptions
from downtime data can be used to determine the CM with stops.
• On/Off Season Production: This data field is especially applicable in
the industries with season based production (i.e., sugar processing). More
often, planned maintenance activities are performed in the off season
when the assets are idle or non-productive. However, maintenance
activities during on season production do not imply failure events only.
During this time, maintenance jobs consist of failure events as well as
regular inspections and non-maintenance activities. One needs to
separate and identify failure events from other maintenance events.
• Maintenance Cost: This refers to the total maintenance cost required to
perform the maintenance work associated with the work order issued.
Depending on the requirements of resources, materials and spare parts,
the total maintenance cost can be classified into low, medium and high.
In general, maintenance activities related to routine inspections or minor
repairs are relatively low in cost due to minimal resource consumption.
However, the maintenance cost starts to increase in events which repair
62 Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models
failures. In many industrial practices, major overhaul and/or/preventive
maintenance events cost relatively far more than failure events. Based on
such classifications, failure and preventive maintenance events can be
identified.
• Maintenance Work Duration: Like maintenance cost, failure and
planned maintenance events can also be distinguished using maintenance
work durations. Major overhaul and/or preventive maintenance events
take a longer time to complete. Such job includes, complete inspections,
tests, measurements, adjustments, and replacements of the asset. Thus,
preventive maintenance tasks require longer time to complete compare
to corrective maintenance time. On the other hand, corrective
maintenance applies to repair unexpected failure and usually requires
less time to repair. Using maintenance time duration, failure and
preventive maintenance events can be determined.
However, the limitations of the existing maintenance databases can be avoided
by recording correct maintenance data, so that industry data can be directly used in the
existing models. Since, no extensive research has been conducted on identifying
standard data recording technique; this study initially proposes such recommendations
on failure time models. Table 3-1 shows the required data fields and related
maintenance databases necessary for failure time model.
Firstly, the installation date of the asset needs to be recorded and a related data
field is necessary (as proposed in Table 3-1). Using this initial commissioning date,
the complete maintenance history (including both failure and preventive maintenance
times) can be determined. To correctly identify the historical failure times, the most
crucial data field (e.g. complete functional loss) is required to be documented. This
data field can be filled out at the end of maintenance work as mentioned in Section
3.1. At the end of every maintenance work, the detailed repair tasks and technical
findings are documented in the work order. At this point, the condition of the asset and
the functional loss can be determined.
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 63
Table 3-1. Suggested data recording to support reliability modelling (i.e. failure time modelling)
Type of
Information
Required
Definition of the Required
Information
Required Data Fields Asset and
Maintenance
Database/s
Basic Asset
Data
Installation Date/Time of the
asset
Installation/Commissioning
Date
Work order
Failure times Asset stoppage due to
unplanned/urgent
maintenance
Work Order Type, Maintenance
Type, Maintenance Priority,
Functional Loss (complete),
Creation and Completion Dates
Work order,
Downtime Data
Scheduled/
Preventive
maintenance
times
Asset stoppage due to
planned/scheduled
maintenance
Work Order Type, Maintenance
Type, Maintenance Priority,
Creation and Completion Dates
Work order,
Downtime Data
Maintenance
Effect
Distinctive effects between
minimal, perfect and
imperfect repair
Maintenance Type,
Maintenance Priority,
Maintenance Effect (Text
Description)
Work order,
Downtime Data
Data Fields in Bold: proposed to be included in the existing databases
It is comparatively easier to determine the planned maintenance times using the
existing data fields (e.g. work order type, maintenance type and work priority). Finally,
a new data field “maintenance effect” is proposed to be included in this study. Along
with maintenance type and priority, this data field can be effective to distinguish
maintenance effects between minimal, perfect and imperfect using standardised text
description. Maintenance effect can also be documented by observing the effectiveness
of the repair work on component or system level. The maintenance personnel need to
differentiate between minimal repair and perfect repair and document their effects on
the asset (i.e. component or system level) accordingly. However, it is comparatively
difficult to record the effect of imperfect repair (which normally varies between zero
to one). In that case, industries can apply their own tagging standards and document
the maintenance effects.
3.2 REQUIREMENT FOR INFORMATION EXTRACTION METHODOLOGY
Although Sections 3.3 discusses that there are a huge number of data fields
available in maintenance databases, such data fields are often incomplete or sometimes
64 Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models
unreliable. Required information is often buried in different maintenance databases.
Moreover, the vast majority of such data require analysis and processing, so that
reliability and maintenance optimisation models can be implemented in real world
industrial settings. Given the mismatch between what is required and what is available,
however, this research develops a method to analyse the existing data fields available
in multiple maintenance databases and to link those databases to identify historical
failure and non-failure planned maintenance times. Since no detailed investigations
have been conducted on information requirements for reliability and maintenance
optimisation models, the current study starts to investigate the method of identifying
failure and non-failure maintenance time identification (See Figure 3-1). Failure times,
planned maintenance times in figure.
Figure 3-1. Overview of information extraction methodology
Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 65
This thesis presents an investigation of the information requirements for the
reliability and maintenance optimisation models and data available in existing
maintenance databases, as described in this chapter. Using related maintenance
information, a methodology is formulated to identify failure and non-failure
maintenance times. In this regard, both supervised and semi-supervised machine
learning methods have been utilised and the methodology is tested on multiple real
world case studies. The detailed discussions on such methodologies and case studies
have been presented in the following chapters.
Chapter 4: Failure Time Extraction Methodology Using Text Mining 67
Chapter 4: Failure Time Extraction Methodology Using Text Mining
Information related to failure and preventive maintenance times is needed to
develop reliability and maintenance optimisation models. This, as a matter of course,
is a requirement specification for such modelling. However, due to changes in
accounting, maintenance practices, and information technology, organisations have
different databases that possess parts of the information needed to define a failure
event. Some common databases (e.g. “work orders”) contain the record of every
maintenance activity (i.e., repair, check or routine inspection) performed on an asset.
However, these databases rarely contain information on the motivation for the activity,
(i.e. was the issue raised to fix a “failure” or due to planned maintenance and did this
event cause any downtime?) In other words, it is often the case that individual
databases do not possess the necessary information to identify a “failure” event, where
one needs to know both when the asset is down and if this downtime was unplanned.
One of the possibilities to link such databases is to analyse the text descriptions (used
to describe the repair process). Such free texts are common in most databases and
contain the information requirement outlined in Chapter.
In this study, a text mining approach is proposed to extract accurate failure time,
using the free texts from multiple maintenance databases. This study automatically
labels fields in one maintenance database using existing data fields and links them with
those in another one through the free text descriptions. The proposed method thus
identifies downtime events whose descriptions are consistent with urgent and
unplanned maintenance.
This chapter begins with some further details of the motivation for linking
multiple maintenance databases using the text mining method. The challenges and
incompleteness in individual maintenance databases is mentioned here. A
methodology is then proposed which uses a text classification method to commonly
available real world databases to attribute each asset stoppage to either of the two
classes: failure or non-failure. The method is finally validated using a series of analysis
and experiments. As an intuitive means to validate the method, three different
validation indicators are applied as described in Section 4.3. The method is applied as
68 Chapter 4: Failure Time Extraction Methodology Using Text Mining
described in Section 4.4. The performance of the classifier constructed from the
proposed application can then be measured.
4.1 MOTIVATION
Real world maintenance databases are typically set up to focus on the work
process of maintenance activities (e.g. communicating what needs to be done by the
maintenance crew and when). In such settings, Database A contains descriptions of the
tasks to be performed and an estimation of how these are prioritised. Another database
(e.g. Database B) specifies the actual permit to work and safety precautions to be taken
for all maintenance activities. A further separate database, Database Z, contains
detailed information on when the asset is not operating, but lacks an unambiguous
indication of the reason for the stoppage. The challenge in uncovering failure time data
(FTD) is thus to identify failure events which correspond to forced downtime of the
asset. Neither database is complete from this perspective: Database A provides
information on what needs to be done and if the work was planned, while Database Z
contains reliable time stamps for when the work (may have) forced the downtime.
Thus, it is important that the databases are interpreted jointly, to identify FTD.
In maintenance databases, different types of data almost always contain free text,
and if the entries in respective databases describe the same event, it is reasonable to
expect that these descriptions will have similar contents. Thus, it is essential to develop
a method for linking Database A and Database Z using these descriptive texts.
4.2 METHODOLOGY
The main goal of the proposed method is to utilize these prescribed maintenance
databases to jointly determine when the asset has “failed”. According to the definition
of failure (i.e., any unplanned and urgent maintenance need which causes an outage)
each database is incomplete in itself in order to decide when the asset is down and if
the downtime is unplanned. If maintenance event in Database A is unplanned (e.g. the
result of a “defect”) and urgent (i.e. “high priority”), the related data in Database A is
considered to describe a potential failure event. Thus, the free texts will likely use
words that the organisation would use to describe a failure. Free texts from Database
A have been used to build a keyword dictionary and further used to construct text
classifiers. Text classifiers based on the free texts of unplanned and urgent
maintenance works are applied to Database Z to determine which maintenance event
Chapter 4: Failure Time Extraction Methodology Using Text Mining 69
is unplanned and/or urgent. Using both the databases helps to identify the actual failure
and non-failure maintenance time. The details of the each step will be discussed in the
following subsections.
Figure 4-1. Methodology to extract failure and non-failure maintenance times
The methodology is based on four main steps:
• Definition of failure
• Data filter and Database A labelling
• Features extraction and Construction of keyword dictionary
• Classifier construction and extraction of failure time
4.2.1 Definition of Failure
Failure of an asset can be defined in many ways depending on maintenance
notifications, failure types etc. In this research, failure is defined considering major
criteria (unplanned maintenance tasks) and outage time. Failure is defined considering
an extensive review of maintenance data, time and existing data types have been
70 Chapter 4: Failure Time Extraction Methodology Using Text Mining
discussed in Section 2.7. In this section, downtime analysis for different maintenance
types is presented.
To determine time related reliability parameters, e.g. MTBF, component life etc.,
the equipment surveillance period is typically used. For many units, the operating/in-
service period is less than the surveillance period due to maintenance, sparing of
equipment or intermittent operation of the unit. Although total downtime can be due
to planned and urgent maintenance as discussed in Section 3.1, Table 4-1 is mentioned
here to illustrate the real picture of phases of different maintenance types during total
downtime. Table 4-1 shows that outage/downtime can be triggered for both planned
and unplanned maintenance. Corrective or any unplanned maintenance is raised due
to failure, because planned maintenance is usually the planned downtime conducted to
prevent failure.
Table 4-1. Downtime classifications based on planned and unplanned maintenance Total Operating Time Downtime Uptime
Planned Downtime Unplanned Downtime Preventive Maintenance
Other Planned Outages
Corrective Maintenance
Other Unplanned Outages
Time of preparation and actual work being done
Modification, testing, checking
Time of preparation and actual work being done
Shutdown, operational problems
Running
It goes without saying that preventive/planned maintenance is usually conducted
before any failure happens while corrective maintenance is issued immediately after
failure occurs. So maintenance records with corrective maintenance denote a high
priority for repair, show urgency and clearly indicate the maintenance tasks to fix a
failure. It logically follows also that any maintenance event which contains unplanned
maintenance work can be defined as related to correcting a failure event. Therefore the
definition of failure in this research is:
“Any unplanned maintenance which causes downtime”
4.2.2 Database A Labelling
To expand on the distinction between two of the databases of concern in the
modelling related to this research, Database A tells us if the maintenance work is
triggered by a certain fault or by a predetermined routine plan. However it does not
Chapter 4: Failure Time Extraction Methodology Using Text Mining 71
indicate if this maintenance event constitutes a “failure”, i.e. if the fault stops the
operation of the asset. Therefore failure time information is not explicitly available in
Database A. On the other hand, Database Z contains asset stoppage information, but
the information as to why it was down is often unavailable.
Data fields in Database A such as work order type, maintenance type and work
priority have been considered to classify the maintenance event. Initially, any
unplanned maintenance event (identified by the “maintenance type” data field) has
been classified as a “failure” event. However, some unplanned maintenance events
may be issued not to repair a failure, but rather to check or inspect any anomaly or to
conduct routine maintenance actions. This situation suggests that the urgency of any
unplanned maintenance is similarly important. In addition to that, any unplanned
urgent maintenance that is raised and fixed within overhaul times (not during
unplanned downtime) needs to be classified as a non-failure (or scheduled
maintenance) event. Overall, using such data fields, Database A texts have been
classified into urgent and unplanned maintenance (possible failure) and planned
maintenance (possible non-failure) events (see Figure 4-2).
Thus in this thesis it is assumed that there are two types of maintenance events:
failure and non-failure. An important limitation exists in the case where opportunistic
maintenance is significant. For instance, if the boiler fan fails, the boiler must be shut
down for repair. One might take this opportunity to repair another part of the boiler
that is down as a result, e.g. the boiler pump. By our working definition of failure, this
is only a corrective maintenance for boiler fan, and a preventive (opportunistic)
maintenance for pump. Thus, opportunistic maintenance may result in a confusion of
preventive and corrective maintenance at the component level, depending on the work
order recording practices of the organisation. Thus, the presence of opportunistic
repairs necessitates clear description and/or tags to indicate which components drove
the downtime event (if any). Nevertheless, at the system level, the identification of this
event as a failure is still correct: the boiler has still failed due to a fan failure. However,
the resolution of the diagnosis of sub-system and component failures depends on the
recording and tagging practices of the organisation.
So, it is argued that Database A events classified as “urgent and unplanned
maintenance” and “planned maintenance” are the probable candidates for “failure” and
72 Chapter 4: Failure Time Extraction Methodology Using Text Mining
“non-failure maintenance” events. Database A, labelled with these two types of events,
will then be used to construct the text classifiers.
Figure 4-2. Data filter and Data Base “A” labelling
4.2.3 Features Extraction and Construction of Keyword Dictionary
The next step is to extract the features using the free texts used in Database A.
At this stage, it is necessary to transform the terms and sentences (i.e., used in free
texts) into a matrix form that machine learning algorithms can understand. This can be
done by splitting the text documents into individual words, which is called
tokenization. A token is the single element of a text string (keyword). The text
classifier requires data in the form of a matrix where each row contains a document
and each column presents a keyword. (Here keywords are the all tokens/terms within
Chapter 4: Failure Time Extraction Methodology Using Text Mining 73
the dictionary). Such a tokenized matrix can be constructed using different features
including BOW and N-Gram (see Section 2.10).
Table 4-2 lists the criteria used to extract features and different keyword
dictionaries (see Column 3). For instance, the keyword dictionary denoted as, 𝑡𝑡𝑡𝑡1 is
constructed based on all the keywords that appear at least once in the training data.
Based on the number of times a keyword appears in the training data, the other two
keyword dictionaries (i.e., 𝑡𝑡𝑡𝑡5 and 𝑡𝑡𝑡𝑡10 ) are created, as indicated in Table 4-2.
Similarly, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 measure the relative frequency of
the keyword in a specific document thorough an inverse proportion of that keyword
over the entire text documents. CS and IG based keyword dictionaries have been
selected randomly based on the total features appearing on training data.
Bi-Gram, Tri-Gram and their three possible combinations have been chosen for
N-Gram based features. Bi-Gram refers to N-Gram features of size 2, and considers
two consecutive keywords to predict the next one. Similarly Tri-Gram considers three
consecutive keywords for prediction and so on. To exploit a more efficient N-Gram
approach, the combinations of Uni-Gram, Bi-Gram and Tri-Gram features have been
extracted (see Table 4-2). Finally, the keyword dictionaries that contain the defined
features have been used for the construction of the text classifier and for the overall
classification purpose.
Table 4-2. Criteria used for feature extraction and construction of keyword dictionary Features Type Keyword
Dictionary Criteria used for Feature Extraction
BOW
TF 𝑡𝑡𝑡𝑡1 Each keyword appears at least once in training data 𝑡𝑡𝑡𝑡5 Each keyword appears at least five times in training data 𝑡𝑡𝑡𝑡10 Each keyword appears at least ten times in training data
TF-IDF (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 Relative weight of each keyword is based on the selection of tf1
(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 Relative weight of each keyword is based on the selection of tf5 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 Relative weight of each keyword is based on the selection of tf10
Chi-square 𝜒𝜒21608 Top 1608 keywords that contain class information of training data 𝜒𝜒21000 Top 1000 keywords that contain class information of training data 𝜒𝜒2500 Top 500 keywords that contain class information of training data
Information Gain
𝐼𝐼𝑁𝑁1608 Most significant & informative 1608 keywords in training data 𝐼𝐼𝑁𝑁1000 Most significant & informative 1000 keywords in training data 𝐼𝐼𝑁𝑁500 Most significant & informative 500 keywords in training data
N-Gram
Bi-Gram 𝑁𝑁𝑁𝑁2 Depends on immediate previous keyword Tri-Gram 𝑁𝑁𝑁𝑁3 Depends on immediate two consecutive previous keywords Uni-Bi-Gram
𝑁𝑁𝑁𝑁1,2 Combination of Uni-Gram & Bi-Gram
Bi-Tri-Gram 𝑁𝑁𝑁𝑁2,3 Combination of Bi-Gram & Tri-Gram Uni-Bi-Tri-
Gram 𝑁𝑁𝑁𝑁1,2,3 Combination of Uni-Gram, B-Gram & Tri-Gram
74 Chapter 4: Failure Time Extraction Methodology Using Text Mining
4.2.4 Classifier Construction and Failure Time Extraction
The text classifier is constructed using two separate algorithms: NB and SVM
(discussed in more detail in Section 2.11). Their performances (using different
measures and metrics mentioned in Section 2.12) are tested on testing data separately.
The detailed information required to construct the text classifier is given below (see
Figure 4-1),
• Labelled data from Database A (Class 1: urgent and unplanned
maintenance and Class 2: all other, including planned maintenance)
• Encoded text from Database A after text cleaning and noise reduction
• Keyword dictionary which is constructed using different features as
explained in Section 4.2.3 above
Given a set of training data from Database A labelled as unplanned or planned
maintenance (𝐷𝐷𝑓𝑓 and, 𝐷𝐷𝑝𝑝 respectively) the NB classifier can be trained as in Algorithm
4-1. To nullify the zero-frequency words in training data, the Laplace Estimator (the
“1” in Line 8) has been used.
Algorithm 4-1. Training Naive Bayes classifier 𝑇𝑇𝑟𝑟𝑎𝑎𝑖𝑖𝑛𝑛 𝑁𝑁𝐵𝐵(𝐷𝐷𝑓𝑓 ,𝐷𝐷𝑝𝑝); 𝐷𝐷𝑓𝑓 = Text field labelled as failure & 𝐷𝐷𝑝𝑝= Text field labelled as preventive
1 Extract keywords from 𝐷𝐷𝑓𝑓 → 𝑉𝑉𝑓𝑓 2 Extract keywords from 𝐷𝐷𝑝𝑝 → 𝑉𝑉𝑝𝑝 3 𝒇𝒇𝒇𝒇𝒇𝒇 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑐𝑐 ∈ {𝑡𝑡,𝑝𝑝} 𝒅𝒅𝒇𝒇 4 𝑁𝑁𝑐𝑐 = |𝐷𝐷𝑐𝑐| No. of documents in class c 5 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟[𝑐𝑐] ← 𝑁𝑁𝑐𝑐/𝑁𝑁 6 𝒇𝒇𝒇𝒇𝒇𝒇 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑡𝑡 ∈ 𝑉𝑉𝑐𝑐 7 𝒅𝒅𝒇𝒇 𝑇𝑇𝑐𝑐𝑡𝑡 Count occurrences of word 𝑡𝑡 in 𝐷𝐷𝑐𝑐 8 𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏[𝑡𝑡][𝑐𝑐] ← 𝑇𝑇𝑐𝑐𝑐𝑐+1
∑ 𝑇𝑇𝑐𝑐𝑐𝑐′+1𝑐𝑐′
9 𝒇𝒇𝒆𝒆𝒓𝒓𝒓𝒓𝒇𝒇𝒓𝒓 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟, 𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏
Using the output of Algorithm 4-1, the text fields in Database Z have been
classified as failure and non-failure in the following manner. Suppose the free text
field in Database Z contains the words 𝑤𝑤1, 𝑤𝑤2, … ,𝑤𝑤𝐶𝐶. Using the NB classifier, the
class label c∗ can be predicted using [90],
Chapter 4: Failure Time Extraction Methodology Using Text Mining 75
𝑐𝑐∗ = arg max
𝑐𝑐={𝑓𝑓,𝑝𝑝} 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟[𝑐𝑐]�𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏[𝑤𝑤𝑖𝑖][𝑐𝑐]
𝐶𝐶
𝑖𝑖=1
4-1
where class 𝑐𝑐∗ = 𝑝𝑝 indicates planned maintenance and 𝑐𝑐∗ = 𝑡𝑡 indicates
unplanned maintenance (failures).
Similarly, the SVM classifier is trained on the tokenized text data from Database
A to recognize descriptions of urgent and unplanned maintenance jobs (see Section
4.2.2). Subsequently, the trained SVM decision function in Eq. (2-40) can be used to
categorise the similarly tokenized 𝐲𝐲𝑖𝑖 (𝐲𝐲𝑖𝑖 are the text descriptions from Database Z of
the unlabelled data 𝒰𝒰) as planned (𝐶𝐶𝑝𝑝) and unplanned (𝐶𝐶𝑓𝑓) work:
𝐶𝐶𝑓𝑓 = �𝐲𝐲𝑖𝑖|𝐲𝐲𝑖𝑖 ∈ 𝒰𝒰 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡�𝐲𝐲𝑖𝑖� ≥ 0� 4-2
𝐶𝐶𝑝𝑝 = �𝐲𝐲𝑖𝑖|𝐲𝐲𝑖𝑖 ∈ 𝒰𝒰 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡�𝐲𝐲𝑖𝑖� < 0� 4-3
4.3 VALIDATION OF THE METHODOLOGY
The performance of the classifier constructed from Database A can be compared
with the actual data labels mentioned in Section 4.2.2. According to the methodology
outlined in Figure 4-1, the text classifier (e.g. trained on Database A) is applied to
Database Z to identify failure events. The performance of such a prediction can be
measured by comparing the predicted labels of Database Z with the actual ones.
Although the true labels of Database Z were not directly recorded, this study estimated
the actual true labels using the existing data fields and thus measured the performance
of the classifier. The detailed validation process is discussed in Sections 5.1.7 and 5.2.7
for the case studies.
In addition to that, a series of different independent (albeit more “intuitive”)
validation indicators are employed,
• Word Cloud: Failure and non-failure word clouds were constructed
from training data from Database A and compared. Keywords appearing
in both the classes (failure and non-failure) can be visually distinguished.
76 Chapter 4: Failure Time Extraction Methodology Using Text Mining
• Text Descriptions: Randomly selected text descriptions from failure and
non-failure data from Database Z were analysed to compare the
keywords that are associated to fix “failure” events and other keywords
related to scheduled maintenance or routine inspection tasks.
• Cumulative Number of Failures: To analyse the accuracy of the
estimated prediction, the cumulative number of failure data items (from
Database Z) was compared before and after applying a text-mining
approach. The predicted failure data was compared with two naïve failure
estimates, i.e. those of Database A and Database Z.
4.4 APPLICATION OF THE METHODOLOGY
The main idea of the proposed failure time extraction methodology was to train
a text classifier using labelled data from Database A and this was then applied to
another database, Database Z which was not labelled at all. The purpose of this text
classification was to classify maintenance events in Database Z into failure and non-
failure labels (see Figure 4-1). In order to intuitively validate such a method, a new
application of the method was proposed and this is shown in Figure 4-3. In this
proposed application of the method, the newly labelled data from Database Z were
used to construct a new classifier. This time, the classifier was re-trained on text
descriptions used in Database A (training data) and those in the newly labelled
Database Z together. The updated classifier was further applied to testing data in
Database A. In this case, the predicted labels of testing data in Database A can be
compared with the actual ones. Moreover, the performance of the new classifier can
be compared with the previous one constructed from training data in Database A. The
detailed application process is highlighted in Figure 4-3.
Chapter 4: Failure Time Extraction Methodology Using Text Mining 77
Figure 4-3. Application of the methodology
The following steps have been considered for the application of the method,
• Re-train a new classifier using the newly labelled data from Database Z
augmented with the initial training data from Database A
• Label testing data from Database A using the new classifier
• Compare the predicted labels of testing data from Database A with the
actual ones
Suppose a new set of training data 𝐳𝐳𝑘𝑘 ∈ ℝ𝑑𝑑 with class labels 𝑐𝑐𝑘𝑘 ∈ {−1,1} where
𝐳𝐳𝑘𝑘 are the work descriptions extracted from the training data from Database A and
from the text data from Database Z combined together. The class labels of Database Z
have been taken from 𝐶𝐶𝑓𝑓 and 𝐶𝐶𝑝𝑝 (see Eqs. 4-2 and 4-3). Thus, the new classifier can
be expressed as:
�̂�𝑐 = �+1 𝑡𝑡(𝐳𝐳𝑘𝑘) ≥ 0 −1 𝑡𝑡(𝐳𝐳𝑘𝑘) < 0 4-4
78 Chapter 4: Failure Time Extraction Methodology Using Text Mining
The decision function in Eq. (4-4) was applied to 𝐭𝐭𝑡𝑡 (𝐭𝐭𝑡𝑡 are the test data from
Database A from data 𝒯𝒯) as planned (𝐶𝐶𝑝𝑝𝑡𝑡) and unplanned (𝐶𝐶𝑓𝑓𝑡𝑡) maintenance work:
𝐶𝐶𝑓𝑓𝑡𝑡 = {𝐭𝐭𝑡𝑡|𝐭𝐭𝑡𝑡 ∈ 𝒯𝒯 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡(𝐭𝐭𝑡𝑡) ≥ 0} 4-5
𝐶𝐶𝑝𝑝𝑡𝑡 = {𝐭𝐭𝑡𝑡|𝐭𝐭𝑡𝑡 ∈ 𝒯𝒯 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡(𝐭𝐭𝑡𝑡) ≥ 0} 4-6
𝐶𝐶𝑓𝑓𝑡𝑡 and 𝐶𝐶𝑝𝑝𝑡𝑡 can be compared with their true labels as mentioned in Section 4.2.2.
The classifier mentioned in Eq. (4-4) was constructed using the text descriptions of
training data from Database A and the newly labelled data from Database Z (i.e.,
labelled by the classifier mentioned in Eqs. 4-2 and 4-3). Thus the performance of the
new classifier can be compared with the classifier trained on the training data from
Database A.
4.5 SUMMARY
This chapter presents a methodology for the extraction of failure and non-failure
maintenance times using commonly available maintenance databases. A text mining
approach is engaged to determine the keywords indicative of the source of unplanned
and urgent maintenance (discussed in Section 4.2.2). This study analyse the text
descriptions of one database to construct the keyword dictionary which is in turn used
to classify each maintenance stoppage into failure or non-failure maintenance time.
Most common text features (Section 4.2.3) and classification algorithms (Section
4.2.4) have been used to construct the keyword dictionary and thus to formulate the
text classifier. A validation method (mentioned in Section 4.3) is proposed to estimate
the actual labels of Database Z which will be used to compare the predicted labels
(identified by the proposed text classifier). In addition to that, a series of independent
validation methods and a new application of the methodology have been conducted.
Chapter 5: Case Studies on Failure Time Extraction 79
Chapter 5: Case Studies on Failure Time Extraction
In this chapter the applicability of the methodology proposed in Chapter 4 is
demonstrated on maintenance databases from an Australian electricity company and
an organisation which has overall responsibility for processing in the Australian sugar
industry. Analysis of the identified failure time appears to confirm the accurate
estimation of failure events in Database Z. The results are expected to be immediately
useful in improving the estimation of the failure time (and thus the reliability models)
for real world assets. The following outlines the main steps for the case studies:
There are six phases: (1) Summarise the basic information and working process
of the two industrial systems. (2) Conduct data description and text cleaning. This
introduces the common databases used in asset and maintenance records. General text
cleaning techniques include removing numbers, punctuation, extra spaces, stop words,
and the non-discriminating words which have been used. (3) Perform data labelling
and feature extraction. This involves the method of labelling data from Database A. In
doing this, different features are considered to develop a tokenized form of vectorized
matrix consisting of keywords and the construction of keyword dictionaries based on
the features. (4) Conduct text classification and performance evaluation. This focuses
on the construction of SVM and NB based text classifiers using different keyword
dictionaries. (5) Extract failure time. Here text classifiers are applied to categorise data
from Database Z into failure or non-failure. (6) Validate the extracted results. In this
final step, comparisons are made between the performance of the text classifier with
the estimated actual values, a new application of the method (see Section 4.4), word
clouds, manually observed text descriptions and the cumulative number of failures
before and after text mining.
5.1 CASE STUDY 1: COAL FIRED POWER GENERATION COMPANY
5.1.1 Overview of a Coal Mill
Power generation industry studies have shown that coal pulverisers are an area
where extensive research to improve equipment reliability is essential. The Electric
80 Chapter 5: Case Studies on Failure Time Extraction
Research Institute (ERI) has determined that 1% of plant availability is lost on average
due to pulveriser related problems. The ERI also identified oil contamination and
excessive leakage as two areas where pulveriser drive train failures account for 53%
of pulveriser problems. Coal mills are generally one of three types: low, medium and
high speed mills [141]. Low and medium speed mills are the most prevalent. Some
examples of medium speed mills include vertical spindle bowl, vertical roller and ring
and ball mills. The physical structure of a typical bowl mill (used in this case study) is
presented in Figure 5-1.
Figure 5-1. Overview of medium-speed (vertical spindle bowl) mill [141]
Pulverization is currently the favoured method of preparing coal for burning.
Mechanically pulverizing coal into a fine powder enables it to be burned like a gas,
thus allowing more efficient combustion. Transported by an air or an air and gas
mixture, pulverized coal can be introduced directly into the boiler for combustion.
5.1.2 Data Description and Text Cleaning
Maintenance data coming from the coal pulverized mills of an Australian power
plant over a 21 year period are used here to illustrate the application of the proposed
information extraction methodology. There are two distinct databases: work
Chapter 5: Case Studies on Failure Time Extraction 81
orders/notifications (WO’s) and downtime data (DD). Using some selected data fields
in the WO database (i.e., maintenance type and work priority), one may determine
whether the work is issued to fix a “defect”, whether it is planned maintenance or
whether there is a level of urgency to perform the maintenance work. However, it does
not tell us that if the work is to fix a “failure”, i.e., if the maintenance work requires
stopping the operation of the asset. On the other hand, the DD database contains asset
stoppage information without stating whether the downtime is forced (i.e., unplanned)
or planned. Such incompleteness in the WO and DD is in line with our assumptions
presented in Section 4.2.2. The WO and DD for 12 mills were used. Figure 5-2 shows
the process of recording the WO and DD during the maintenance process.
Figure 5-2. Recording of two databases (WO and DD) during maintenance process (coal mill)
Table 5-1 shows five randomly selected data from each of the two databases.
Table 5-1 (a) shows that WOs contain information of the maintenance event (e.g.
maintenance type, work priority). However, on examining the WO text descriptions
from this table, one can see that the maintenance work descriptions are not
straightforward to interpret and there are few “obvious” descriptions of failure events
82 Chapter 5: Case Studies on Failure Time Extraction
to the non-expert. Moreover, in the DD entries seen in Table 5-1 (b), “failure” is even
harder to recognize by inspecting work descriptions, particularly without any tags
denoting the urgency or type of the event. Thus, while both databases contain relevant
information for failure events, they should be interpreted together to ascertain failure
times of the asset.
Table 5-1. Five randomly selected data from (a) WO and (b) DD during maintenance process (data is slightly edited to protect proprietary information)
(a)Work Order Maintenance Type
Work Priority
Work Description
Defect Urgent XF PF mill PF leak repair on top of mill Preventive Maintenance Scheduled YF Major overhaul of mill
Defect Immediate Xf mill pyrites sluiceway is blocking wi Modification Planned Mill starting interlock bypass Defect Immediate XD top pyrt. gate not closing
(b) Downtime Data Work Descriptions Mech XD Mill, Fdr. Mech. Maint. From hot air gate Mech YE Mill Pyrites Doors. Hydraulic isolation only Mech.- YD Mill. Repair PF Leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill and Seal Air Fan 415V AC, Air, Supplies Isolated Elect. & mech. isolation of mill & feeder
First, the text descriptions of WO’s and DD from coal mills were cleaned
according to well-known techniques (see Section 2.10). One of the important parts in
text cleaning is to exclude non-discriminating words. In this case study, keywords such
as “mill”, and other location information text were excluded (see Table 5-2) as
although they are quite common in the free text they contain no information associated
with failure/non-failure (i.e. are non-discriminating). Table 5-2 also shows that
punctuation, numbers, and white space are removed from the data.
Table 5-2. Comparing a few WO data before and after cleaning process Documents Before Cleaning Documents After Cleaning PF LEAK ON XF MILL. Pf leak PF leak on XF mill pf leak PF Leak on 1st joint above riffle box pf leak st joint riffle box XMill spilling badly spilling badly Mill Windbox Scraper Upgrade Trial windbox scraper upgrade trial mill windbox has a hole where inspection windbox hole inspection
Chapter 5: Case Studies on Failure Time Extraction 83
The cleaned texts can be viewed as a word cloud, an example of which for the
original work order data is shown in Figure 5-3 (the size of the word indicates its
relative frequency). Only keywords that appear more than 100 times in the WO
database have been displayed in the word cloud. In Figure 5-3, the most frequently
occurring word is “air” while “seal”, “fan” and “leak” also occur quite commonly. One
can see both the words that are likely to indicate failure (e.g. “repair”, “leak”, or
“block”) and those that indicate planned tasks (e.g. “inspect” and “overhaul”).
Interestingly, words that obviously indicate failure appear to be fairly infrequent in the
WO text.
Figure 5-3. Word cloud representing the keywords appearing in WO
5.1.3 Work Order Labelling and Feature Extraction
Using the labelling technique mentioned in Section 4.2.2, WOs (total 9436 data)
were classified as unplanned and urgent maintenance or planned maintenance (5053
and 4383 respectively). 80% of WO data was stored as training data while the
remainder was kept as testing data. The keyword dictionary was then formulated by
using the different features mentioned in Section 4.2.3. Performance of text
classification largely depends on such selected features. For instance, keyword
dictionaries were constructed from the features 𝑡𝑡𝑡𝑡1 and 𝑁𝑁𝑁𝑁12 and parts of such
84 Chapter 5: Case Studies on Failure Time Extraction
dictionaries are shown in Table 5-3 and Table 5-4 respectively. Keyword dictionaries
constructed from other features are presented in Appendix A.
Table 5-3. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 1 Record No. Keywords [26-30] "airborne" "airdust" "airflow" "airoil" "alarm" [31-35] "alm" "amount" "adjacent" "analog" "analysis" [36-40] "annubar" "apart" "appear" "adjust" "applied" [41-45] "appo" "approx" "april" "aprox" "araldite" [46-50] "adrift" "areas" "arm" "armdamp" "around"
Table 5-4. A portion of Mixed-Gram keyword dictionary (𝑁𝑁𝑁𝑁12) for Case Study 1 Record No. Keywords [1000-1007] "coal outag" "coal pf" "coal probe" "coal pyrites" "coal sampl" "coal spilling"
"coil" [1008-1014] "coil blown" "coil burnt" "cold" "cold air" "collar" "collar nrv"
"collected" [1015-1021] "collected mil" "colour" "colour pl" "com" "come" "come adrift"
"come adriftr" [1022-1028] "come away" "coming" "coming abo" "coming adrift" "coming apart" "coming
away" "coming baffles"
5.1.4 Training and Testing Text Classifiers
In this research, using the WO database, performances of NB and SVM based
classifiers are compared. Initially both the classifiers are separately trained using
different keyword dictionaries (as mentioned in Section 4.2.3) and their performances
are tested by comparing the predicted values of failure and non-failure work orders
with actual ones not utilized in the training data. Table 5-5 and Table 5-6 show the
performances (i.e. accuracy, precision, recall and F-Measure) of two models, support
vector machine (SVM) and Naïve Bayes (NB) on the testing data using different
keyword dictionaries constructed from various feature selection methods. It can be
seen from Table 5-5 that the SVM based classifier outperforms the NB model for all
the features. N-Gram based keyword dictionaries are superior to all other feature based
dictionary types. TF based SVM performs comparatively better than TF-IDF. Among
TF methods, 𝑡𝑡𝑡𝑡5 is superior to the other two (i.e., 𝑡𝑡𝑡𝑡1, 𝑡𝑡𝑡𝑡10) methods which implies that
the classifier containing the keywords that appear at least five times in the training
documents shows the best performance. Moreover, performances of CS and IG based
SVM are comparatively similar to each other. It should be noted that the IG based
SVM performance is superior if the keyword dictionary contains all the keywords that
appear in the training data.
Chapter 5: Case Studies on Failure Time Extraction 85
Table 5-5. Performances between SVM and NB classifiers using different keyword dictionaries
Support Vector Machine Naïve Bayes Keyword
Dictionary Accuracy Precision Recall Accuracy Precision Recall
𝑡𝑡𝑡𝑡1 96.29 96.23 96.81 92.32 93.59 91.82 𝑡𝑡𝑡𝑡5 96.82 96.45 97.60 92.79 92.37 94.21 𝑡𝑡𝑡𝑡10 96.77 96.45 97.50 92.32 91.00 94.91
(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 96.13 96.13 96.61 92.32 93.59 91.82 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 96.71 96.26 97.60 92.79 92.37 94.21 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 96.71 96.35 97.50 92.32 91.00 94.91
𝜒𝜒21608 96.66 95.69 98.23 92.32 93.59 91.82 𝜒𝜒21000 96.50 95.50 98.13 92.26 93.67 91.62 𝜒𝜒2500 94.60 92.32 98.13 92.58 92.01 94.21 𝐼𝐼𝑁𝑁1608 96.71 95.69 98.33 92.32 93.59 91.82 𝐼𝐼𝑁𝑁1000 96.50 95.50 98.13 92.21 93.67 91.52 𝐼𝐼𝑁𝑁500 94.59 92.32 98.13 92.47 92.07 93.91 𝑁𝑁𝑁𝑁2 96.93 95.45 99.02 93.03 91.05 96.41 𝑁𝑁𝑁𝑁3 91.10 92.34 90.05 87.33 81.24 99.00 𝑁𝑁𝑁𝑁1,2 97.18 95.99 98.82 93.48 91.90 96.21 𝑁𝑁𝑁𝑁2,3 95.71 93.74 98.62 93.22 89.80 98.40 𝑁𝑁𝑁𝑁1,2,3 96.45 95.97 98.72 93.48 91.19 97.11
Table 5-6. Comparison of Accuracy and F-Measures between SVM and NB using Different Keyword Dictionaries for Case Study 1 (Coal Mills)
Support Vector Machine Naïve Bayes Keyword
Dictionary Accuracy F-Measure Accuracy F-Measure
𝑡𝑡𝑡𝑡1 96.29 96.52 92.32 92.70 𝑡𝑡𝑡𝑡5 96.82 97.02 92.79 93.28 𝑡𝑡𝑡𝑡10 96.77 96.97 92.32 92.91
(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 96.13 96.37 92.32 92.70 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 96.71 96.93 92.79 93.28 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 96.71 96.92 92.32 92.91
𝜒𝜒21608 96.66 96.94 92.32 92.70 𝜒𝜒21000 96.50 96.80 92.26 92.63 𝜒𝜒2500 94.60 95.14 92.58 93.10 𝐼𝐼𝑁𝑁1608 96.71 96.99 92.32 92.70 𝐼𝐼𝑁𝑁1000 96.50 96.80 92.21 92.58 𝐼𝐼𝑁𝑁500 94.59 95.14 92.47 92.98 𝑁𝑁𝑁𝑁2 96.93 97.20 93.03 93.65 𝑁𝑁𝑁𝑁3 91.10 91.18 87.33 89.25 𝑁𝑁𝑁𝑁1,2 97.18 97.38 93.48 94.01 𝑁𝑁𝑁𝑁2,3 95.71 96.12 93.22 93.90 𝑁𝑁𝑁𝑁1,2,3 96.45 97.33 93.48 94.06
As shown in Table 5-5 and Table 5-6, the accuracy and F-Measure obtained
using Bi-Gram is better than Uni-Gram and Tri-Gram. In other words the SVM
classifier using two consecutive keywords (i.e., Bi-Gram) performs better compared
to considering a single keyword or three consecutive keywords. In the case of the NB
classifier, the accuracy obtained using Bi-Gram is better than that when using Uni-
Gram or Tri-Gram. Nevertheless, NB and SVM classifiers constructed from the
Mixed-Gram method outperform other feature selection methods. That is to say, a
combination of Uni-Gram and Bi-Gram shows better accuracy compared to the other
86 Chapter 5: Case Studies on Failure Time Extraction
two combinations. This implies that combinations of two and three consecutive words
or more make the data noisy and sparse and inversely affect the performance.
In summary, it was shown that the SVM performs the best using a keyword
dictionary constructed from the N-Gram feature. Interestingly, the SVM with a
combination of Uni-Gram and Bi-Gram shows the best accuracy for Case Study 1 (see
Table 5-5). The confusion matrices for both SVM and NB text classifiers using all the
text features are summarised in Appendix B.
5.1.5 Comparison between Failure and Non-Failure Work Orders
After training the classifier on the WOs, the resulting classifier may be used to
assign to each DD the category of either failure or non-failure. One can examine the
word clouds as a simple intuitive validation of the results. Figure 5-4 (a) and (b), which
are the failure and non-failure word clouds, respectively, illustrate this for the coal mill
data. While “air” tends to appear in both, one can clearly see that the words that
intuitively indicate failure are more prevalent in the failure word cloud (e.g. “repair”
and “leak”). On the other hand, Figure 5-4 (b) also shows some words that clearly
indicate non-failure (e.g. “change” and “inspect”).
(a)
Chapter 5: Case Studies on Failure Time Extraction 87
(b)
Figure 5-4. Word clouds for (a) failure and (b) non-failure WO’s for coal mills
5.1.6 Failure Time Extraction
Since the Mixed-Gram (i.e., combination of Uni-Gram and Bi-Gram) based
SVM classifier shows a higher performance over the NB classifier (see Section 5.1.4),
failure times were identified using this classifier. In this regard, the Mixed-Gram SVM
classifier was applied to the DD to label each text description as either failure or non-
failure. Table 5-7 presents the outcome of the predictions. There are a total of six plants
(referred to as A, B, C, D, E and F) in each of the two units, X and Y. The predicted
result indicates that the proposed text mining method classifies a large number of DD
items as “failures”. Around 90% of the DD are identified to be “failures” which
suggests that most of the maintenance notifications in the DD have been issued to
repair “failure” events.
Table 5-7. Predicted instances of failure and non-failure downtimes (Coal mills) Unit X Unit Y Total
Instances Mill Mill A B C D E F A B C D E F
Failure 100 133 132 95 110 97 105 98 144 129 146 100 1389 Non-failure 13 8 11 15 39 9 8 11 10 15 11 15 165
Total Instances 113 141 143 110 149 106 113 109 154 144 157 115 1554
Of course, these predicted values cannot be objectively validated since true
failures cannot be independently verified in the historical data. For this reason, an
alternative method (mentioned in Section 4.3) to validate the predicted failure times
was devised and this is discussed further in the next section.
88 Chapter 5: Case Studies on Failure Time Extraction
5.1.7 Validation of the Text Classifier
According to the proposed method (mentioned in Fig. 4.1), the WO trained text
classifier is finally applied to the DD to classify them into failure or non-failure. To
validate the accuracy of the text classifier (which is constructed from WO’s), one may
compare the predicted labels of the DD with the estimated “actual” ones using the
existing data fields. First, the actual labels of the DD were identified by using the data
fields recorded. Although the true labels are not directly recorded in the DD, the data
fields (i.e., cause code, cause description and work description) can be used for such a
purpose. Due to the lack of additional information (other than maintenance work
descriptions) in Case Study 1, failure and non-failure labels in the DD were estimated
by manual examination of maintenance work descriptions. By manual examination, a
total of 1360 records of the DD were classified as “failure” events while the rest of the
194 records were labelled as “non-failure” events.
Estimated actual labels of the DD were then compared with the ones predicted
by the text classifier and the cross tabulation result is shown in Table 5-8. The accuracy
(88.87%), precision (88.92%) and recall (99.71%) values (using Eqs. 2-42, 2-43, and
2-44) indicated that the WO trained text classifier performs well in the DD as well.
Thus the text classifier developed in the proposed methodology can efficiently identify
the accurate “failure” events.
Table 5-8. Cross tabulation of the DD comparing predicted labels with the estimated ones
Actual Failure Non-Failure
Prediction Failure 1356 169 Non-
Failure 4 25
5.1.8 Application of the Methodology
As discussed in Section 4.4, one way to perform an intuitive validation of the
results is to train a (new) classifier using the predicted DD labels and apply it to the
WOs. A new classifier was constructed using the DD (with the predicted DD labels
shown in Table 5-7) augmented with the initial training WO data. A combined N-Gram
(i.e., composed of both Uni-Gram and Bi-Gram) based SVM classifier was formulated
in this case and subsequently applied to testing WO data to label them as failure or
Chapter 5: Case Studies on Failure Time Extraction 89
non-failure. A total of 9103 training data items (7549 from the training WO data and
1554 from the DD) were used to construct the N-Gram based keyword dictionary (a
total of 5997 keywords which are a combination of Uni-Gram and Bi-Gram). The new
classifier classified the testing WO: failures (1010 records) and non-failures (877
records). This methodology compares the predicted and actual labels of the WO data
tested in this way (see Table 5-9). It can be seen that the accuracy of the new classifier
(97.24%) is better than that of the previous classifier (97.18%). Due to the inclusion
of the newly labelled DD, the new classifier can be used to identify failure events using
the upcoming DD more accurately.
Table 5-9. Cross tabulation of testing work orders comparing predicted labels to
the actual ones
Actual Failure Non-Failure
Prediction Failure 963 47 Non-
Failure 5 872
5.1.9 Comparison between Failure and Non-Failure DD using Text Descriptions
In place of independent validation of the accuracy of the results, one can examine
the text descriptions of the classified DD (shown in Table 5-10 for five randomly
selected data from each of the two classes). The text descriptions of the non-failure
column in Table 5-10 contain some keywords that one would intuitively expect (e.g.
“overhaul”, “change”, “oil”, “cleaning”, etc.) that reflect repair work for non-failure
maintenance. Similarly, text descriptions from the failure column in Table 5-10
contain some keywords (e.g. “leak”) that clearly indicate maintenance works to fix a
failure. Other descriptions are more ambiguous (e.g. “Mech.- XA Mill Pyrites
Doors.”). However, identifying these descriptions as unplanned is not necessarily
incorrect. If the WOs contain a large number of high priority and unplanned activities
on the “pyrites doors”, then the presence of that text certainly lends evidence to the
DD being a failure. In other words, the lack of obvious failure words may provide
further support for the argument to use the text mining method in this research,
particularly if the failures need to be identified by non-experts.
90 Chapter 5: Case Studies on Failure Time Extraction
Table 5-10. Randomly selected predicted downtime data of Unit X Mill A (coal mill) Types of Maintenance Records
Failure Non-Failure Mech.- XA Mill. Repair PF Leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill.
Major Overhaul - from Feeder Outlet Spade to PF Outlet Flaps on top of Mill AND to Hot & Cold Air Gates. 3.3kV MSD, 240V ac , 220V ac , 110v ac , & 110V dc supplies; Seal Air; RCW; Lube Oil; Hydraulic Oil .
Mech.- XA Mill Pyrites Doors. Hydraulic isolation only
Change over and Isolate Valve
ELECT. - 415 V A.C. . MECH. - Hot Air Gate Shut
Supplies Isolated, Oil Mechanical Maintenance to 40 MICRON GEARBOX LUBE OIL FILTER
Mech.- XA Mill. Repair dust leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill.
Cleaning & meggering of motor
Hyd Isol v/v's - Shut Door Relays - Pulled.
CMOP Overhaul of Rolls & Journals,Scrapers,Body internals,Picollo Tube,Thermocouples, Pyrities V/V's and Hyd's,Seal Air Fan,Discharge v/v's,Lube Oil Press switch,Mill DP instrumentation, Replace Riffle Elements, Lube Oil p/p,Seal Air Fan v/v's & PF pipes
5.1.10 Cumulative Number of Failures before and after Text Mining
For another perspective on the failure time data, one may examine the
cumulative number of failures before and after the text mining approach has been
applied, as seen in Figure 5-5. As a comparison, the cumulative number of raw WO
and DD events are plotted (in green and blue, respectively), which may be (naively)
assumed to be failure times if all events are considered unplanned. In Figure 5-5, the
text-mined failure times are plotted in brown and clearly indicate that the number of
cumulative failures is less than the raw number of DD events. However, it can be seen
that analysis of the raw DD data would likely provide a reasonable estimate of the
failure intensity (e.g., average number of failures per unit time), since the text-mined
failure intensity and the raw DD failure intensity are quite similar.
Chapter 5: Case Studies on Failure Time Extraction 91
Figure 5-5. Cumulative number of failures for Unit X, Mill A (coal mill)
5.2 CASE STUDY 2: BOILERS IN SUGAR PROCESSING INDUSTRY
5.2.1 Overview of a Boiler System
The second case study used to demonstrate the proposed methodology was based
on the sugar processing industry which not only manufactures sugar but also produces
electricity and methanol. Five nations (including Australia) account for 40% of the
world’s total sugar production [142]. The boiler is the essential component in sugar
mills and failure of such a critical component causes huge production loss. This is a
closed vessel in which water or fluid is heated under pressure. The steam or hot fluid
is then circulated out of the boiler in various processes. The main input fuel used in
the boiler is bagasse, which is a by-product of the sugar extraction process (see Figure
5-6).
92 Chapter 5: Case Studies on Failure Time Extraction
Figure 5-6. Functional layout of sugar processing system (adapted from [142])
5.2.2 Data Description and Text Cleaning
Maintenance data for a series of boilers were collected over a 26 year period.
Similarly to Case Study 1, there are two distinct databases, the WOs and the DD, which
once again are incomplete in the manner discussed in the introduction. Table 5-11
shows a few examples of WOs and DD from the data and illustrates that information
from both the databases is essential to interpret the failure events.
Table 5-11. Five randomly selected examples from (a) WO and (b) DD (data is slightly edited to protect proprietary information)
(a)Work Order Maintenance Type
Work Priority
Work Description
Corrective Maintenance Urgent Broken conduit at Ash system
Preventive Maintenance Scheduled Annual Washdown - Boiler 2
Corrective Maintenance Immediate Repair worn areas of bagasse chute
Preventive Maintenance Planned Minor Overhaul Ash System Pumps
Corrective Maintenance Immediate Reweld broken deflector Bagcon. 2
(b) Downtime Data Work Descriptions No 3 bagasse belt tripped - fire hose on trip cable Low steam Removing bolt from no.3 bagasse belt Esj tank full - low steam pressure Elect. & mech. isolation of mill & feeder
Chapter 5: Case Studies on Failure Time Extraction 93
Text cleaning techniques were applied to remove sparseness, punctuation and
unwanted features from the data as detailed in the methodology and demonstrated in
Case Study 1. In this case, keywords such as “mill”, “bagasse” and other asset
identification texts which are common but non-discriminating were removed. Figure
5-7 displays the word cloud of WO data showing most frequent keywords. In this word
cloud, some keywords clearly indicate maintenance: “replace”, “repair”, etc.; some
suggest possible failure terms: “leak” etc. and some indicate planned or routine
inspections: “overhaul”, “clean”, “inspect”, “test” etc.
Figure 5-7. Word cloud representing the keywords appearing in WO
5.2.3 Work Order Labelling and Feature Extraction
The data labelling techniques for unplanned and urgent maintenance work
(mentioned in Section 4.2.2) were applied to the WO database (total of 2,350 records)
and thus they were classified into urgent (713 records) and planned maintenance
(1,637 records). The keyword dictionary was then formulated by using different
features mentioned in Section 4.2.3. Table 5-12 and Table 5-13 show the keyword
dictionaries constructed from the features 𝑡𝑡𝑡𝑡1 and 𝜒𝜒2500 respectively. The rest of the
keyword dictionaries constructed from other features are presented in Appendix C.
94 Chapter 5: Case Studies on Failure Time Extraction
Table 5-12. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 2 Record No. Keywords [1000-1009] "second" "secondary" "sect" "section" "send" "sense" "sensing" "sensor"
"sensors" [1010-1018] "sensorsin" "sequ" "servic" "service" "services" "set" "sets" "setup"
"sewing" [1019-1027] "sfty" "sfy" "shaft" "shafts" "sharp" "shear" "sheet" "shield"
"shift" [1028-1031] "shower" "shrouds" "shtr" "shute"
Table 5-13. A portion of keyword dictionary (𝜒𝜒2500) for Case Study 2 Record No. Keywords [26-30] "rol" "blr" "access" "actu" "actuat" [31-35] "actuator" "actuators" "add" "added" "adjust" [36-40] "adjustable" "adjustcheck" "adjustment" "aerofoil" "afan" [41-45] "air" "airheat" "airheater" "airheatertubes" "alarm" [46-50] "align" "alignment" "allow" "alter" "ammet"
5.2.4 Training and Testing Text Classifiers
Like Case Study 1, performances of SVM and NB classifiers were measured on
testing data and these are shown in Table 5-14 and Table 5-15. Both the classifiers
were separately trained on keyword dictionaries (discussed in Section 5.2.3). Table
5-14 shows the model performances using different keyword dictionaries constructed
from various features. As illustrated in the table, the 𝜒𝜒2500 based SVM classifier is
superior to all other methods. Performances of both TF and TF-IDF based SVM
classifiers appear to be quite similar, and better overall compared to the NB classifiers.
However, the recall values for SVM are below satisfactory (see Column 4 in Table
5-14). Among the TF-IDF methods, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 is superior to the other three
approaches (i.e., (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 ), which implies that the
classifier containing the keywords that appear at least once in the training data through
an inverse proportion of each keyword over the entire training data shows best
performance.
Chapter 5: Case Studies on Failure Time Extraction 95
Table 5-14. Performances between SVM and NB using different keyword dictionaries
Support Vector Machine Naïve Bayes Keyword
Dictionary Accuracy Precision Recall Accuracy Precision Recall
𝑡𝑡𝑡𝑡1 71.45 67.74 12.50 70.18 51.33 45.83 𝑡𝑡𝑡𝑡2 71.82 64.44 17.26 69.09 49.32 42.86 𝑡𝑡𝑡𝑡5 71.45 60.00 19.64 69.45 50.00 32.14 𝑡𝑡𝑡𝑡10 70.36 56.10 13.69 69.27 49.37 23.21
(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 72.91 85.19 13.69 70.18 51.33 45.83 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2 71.82 68.57 14.29 69.09 51.33 45.83 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 71.82 66.67 15.48 69.45 50.00 32.14 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 70.91 61.76 12.50 69.27 49.37 23.21
𝜒𝜒21297 71.91 53.13 12.69 69.57 46.62 46.27 𝜒𝜒21000 72.13 53.33 17.91 70.21 47.54 43.28 𝜒𝜒2700 72.97 56.36 23.13 71.06 49.04 38.06 𝜒𝜒2500 74.04 60.00 26.87 71.70 50.56 33.58 𝜒𝜒2300 74.04 65.79 18.66 71.28 49.32 26.87 𝐼𝐼𝑁𝑁1297 71.91 53.13 12.69 69.57 46.62 46.27 𝐼𝐼𝑁𝑁1000 72.13 56.36 23.13 70.00 47.06 41.79 𝐼𝐼𝑁𝑁700 73.40 58.18 23.88 70.43 47.62 37.31 𝐼𝐼𝑁𝑁500 74.04 60.00 26.87 71.49 50.00 32.84 𝐼𝐼𝑁𝑁300 74.04 65.79 18.66 72.34 52.63 29.85 𝑁𝑁𝑁𝑁2 72.55 100 3.73 69.15 41.27 19.40 𝑁𝑁𝑁𝑁3 71.70 66.67 1.49 70.64 41.67 7.46 𝑁𝑁𝑁𝑁1,2 72.98 88.89 5.97 67.02 43.71 54.48 𝑁𝑁𝑁𝑁2,3 72.13 80.00 2.99 67.23 37.18 21.64 𝑁𝑁𝑁𝑁1,2,3 71.91 53.57 11.19 64.47 42.47 68.66
Table 5-15. Comparison of Accuracy and F-Measures between SVM and NB using Different Keyword Dictionaries for Case Study 2 (Boilers)
Support Vector Machine
Naïve Bayes
Keyword Dictionary Accuracy F-Measure Accuracy F-Measure
𝑡𝑡𝑡𝑡1 71.45 21.11 70.18 48.42 𝑡𝑡𝑡𝑡2 71.82 27.23 69.09 45.86 𝑡𝑡𝑡𝑡5 71.45 29.59 69.45 39.13 𝑡𝑡𝑡𝑡10 70.36 22.01 69.27 31.58
(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 72.91 23.59 70.18 48.42 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2 71.82 23.65 69.09 48.42 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 71.82 25.13 69.45 39.13 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 70.91 20.79 69.27 31.58
𝜒𝜒21297 71.91 20.49 69.57 46.44 𝜒𝜒21000 72.13 26.81 70.21 45.31 𝜒𝜒2700 72.97 32.80 71.06 42.86 𝜒𝜒2500 74.04 37.12 71.70 40.36 𝜒𝜒2300 74.04 29.07 71.28 34.79 𝐼𝐼𝑁𝑁1297 71.91 20.49 69.57 46.44 𝐼𝐼𝑁𝑁1000 72.13 32.80 70.00 44.27 𝐼𝐼𝑁𝑁700 73.40 33.86 70.43 41.84 𝐼𝐼𝑁𝑁500 74.04 37.12 71.49 39.64 𝐼𝐼𝑁𝑁300 74.04 29.07 72.34 38.09 𝑁𝑁𝑁𝑁2 72.55 7.19 69.15 26.39 𝑁𝑁𝑁𝑁3 71.70 2.91 70.64 12.65 𝑁𝑁𝑁𝑁1,2 72.98 11.19 67.02 48.50 𝑁𝑁𝑁𝑁2,3 72.13 5.76 67.23 27.36 𝑁𝑁𝑁𝑁1,2,3 71.91 18.51 64.47 52.48
96 Chapter 5: Case Studies on Failure Time Extraction
Keyword dictionary (based on both CS Statistic and IG) performs best among
all other methods. For instance, instance, a keyword dictionary constructed from the
500 most informative features achieves the highest accuracy (i.e., 74.04%) for the
SVM classifier. In terms of measuring the completeness (i.e., recall value) of the
classifier result, the NB performs better compared to the SVM. Using the NB classifier,
the recall values of the Mixed-Gram approach (whose recall values are highlighted in
bold) are superior to those of the Bi-Gram or Tri-Gram methods.
In summary, it is evident from Table 5-14 and Table 5-15 that the SVM based
classifier outperforms NB model and performs best (i.e., in terms of accuracy and
precision) using a keyword dictionary constructed from 𝜒𝜒2500 features. However, the
recall values of the Table 5-14 and F-Measure (in Table 5-15) suggest that the NB
classifier using Mixed-Gram features is the best choice for the extraction. The
confusion matrices for NB and SVM based text classifiers are shown in Appendix D.
5.2.5 Comparison between Failure and Non-Failure Work Orders
As a part of visual validation of the prediction, one may examine Figure 5-8 (a)
and (b) for boiler WO word clouds. One can clearly note failure words in Figure 5-8
(a) (e.g. “choked” and “failed”). Work order descriptions containing such keywords
indicate asset malfunction due to failure. On the other hand, in Figure 5-8 (b), certain
words (i.e., “overhaul”, “inspect” and “checks”) indicate obvious planned maintenance
events or routine inspection tasks on boilers.
(a)
Chapter 5: Case Studies on Failure Time Extraction 97
(b)
Figure 5-8. Word clouds for (a) failure and (b) non-failure WO’s for boilers
5.2.6 Failure Time Extraction
The 𝜒𝜒2500 based SVM classifier is applied to the DD to classify the data items
as failure or non-failure. Table 5-16 presents the outcome of the predictions. The table
clearly indicates that the text mining approach classifies a significant number of the
DD as non-failure, where the vast majority of the maintenance jobs appear to have text
descriptions that indicate non-failure tasks. Table 5-16 (see values of Columns 3 and
4 in bold) specifies that a large number of non-failure jobs were carried out in the
boiler body and bagasse system. Frequent planned maintenance work and continuous
monitoring on the boiler body and bagasse system is vital for safe functioning.
Table 5-16. Predicted instances of failure and non-failure downtimes (boilers) Functional Locations ASH BD BAG STM TA BF OT Total
Instances Failure 02 16 20 02 02 02 12 56
Non-failure 17 299 391 21 64 14 188 994 Total Instances 19 315 411 23 66 16 200 1050
ASH: Ash; BD: Boiler Body; BAG: Bagasse; STM: Steam; TA: Turbo Alternator; BF: Boiler Feed Water; OT: Other
5.2.7 Validation of the Text Classifier
As with Case Study 1, the predicted labels of the DD were compared with the
estimated “actual” ones using the existing data fields. The data fields (e.g. cause code,
98 Chapter 5: Case Studies on Failure Time Extraction
cause descriptions and maintenance work descriptions) were used to estimate the
actual labels of the DD. First, cause codes and cause descriptions of the DD events
were utilized to classify them into failure and non-failure. Cause descriptions of the
DD which contain the keywords “leak”, “block”, “jam”, “fail”, “break” etc. were
chosen to denote failure while the keywords “adjustment”, “cleaning”, “incorrect-
setting” were chosen to classify non-failure events (see Table 5-17). However, some
cause codes are hard to classify in this way and contain non-discriminating
characteristics (for instance, “control linkages” or “circuit breaker”). To overcome
such difficulty, an additional data field (i.e., work descriptions) was chosen. Such text
descriptions were manually examined along with the cause codes to finally classify the
DD into one of the two labels: failure and non-failure. Following this approach, a total
of 1047 DD records were classified into failure (242) and non-failure (805) labels.
Table 5-17. Estimating actual DD labels using the existing data fields Data Fields in DD Actual
Labels of DD
Cause Code
Cause Description Work Description
27, 28, 16, 19, 26, 25, 15, 32, 35
Driven Machine Failure, leak, Choked / Blocked / Jammed, Underspeed / Pressure Tri, Drive / Transmission Fail, Structural Failure, Overload / Overfull, Motor Failure, Fuse / Circuit Breaker
Manual Examination
Failure
22, 20, 29, 99, 36, 30, 34, 70, 3, 23, 72
Adjustment / Cleaning, Safety Manual Trip, Incorrect Setting / Adjus, Cane Quality, Software Error, Control / Linkages, Electrical Power Supply, Low Steam - Poor Fuel, Scheduled Mid Week Stop, Derailment, High Juice Levels
Non-Failure
The estimated labels of the DD were then used to compare with those predicted
by the text classifier. In this case, the χ2500 based SVM classifier was used to predict
the DD labels. Table 5-18 shows the cross tabulation outcomes of the prediction.
Table 5-18. Cross tabulation of the DD comparing predicted labels with the estimated ones
Actual Failure Non-Failure
Prediction Failure 111 38 Non-
Failure 131 767
Chapter 5: Case Studies on Failure Time Extraction 99
The χ2500 based SVM classifier constructed from the WO’s shows better
accuracy (83.86%) and precision (74.5%) when it is applied to the DD. However, the
recall value (45.88%) indicates that the classifier is still a weak tool for identifying the
true positive rates. Overall, it is evident that the methodology can identify the failure
event more efficiently.
5.2.8 Application of the Methodology
A new classifier (as shown in Section 4.4) was constructed using the DD (with
the predicted DD labels shown in Table 5-16) augmented with the initial training WO
data. A χ2500 based SVM classifier was formulated in this case and subsequently
applied to testing the WO data to classify them as failures or non-failures. A total of
2930 training data (1880 training WO data and 1050 DD) were used to construct the
χ2500 based keyword dictionary. The new classifier classified the testing WO into
failures (60 data items) and non-failures (410 data items). Table 5-19 compares the
predicted and actual labels of the testing WO data. The cross tabulation result implies
that the accuracy of the new classifier (74.26%) is better than the previous classifier
(74.04%).
Table 5-19. Cross tabulation of testing work orders comparing predicted labels with the actual ones
Actual Failure Non-Failure
Prediction Failure 38 22 Non-
Failure 99 311
5.2.9 Comparison between Failure and Non-Failure DD using Work Descriptions
In place of independent validation of the accuracy of the results, one can examine
the text descriptions of the classified DD (shown in Table 5-20 for five randomly
selected data samples from each of the two classes). The obvious failure keywords,
“stop”, “broken”, “jammed”, etc. are clearly evident in the failure column of Table
5-20. On the other hand, the non-failure column contains some keywords (e.g.,
“checking”) that indicate a planned or routine inspection task to continuously monitor
the asset condition. In addition, the non-failure column does not contain obvious
indicators of non-failure maintenance. Nevertheless, the non-failures in the DD clearly
100 Chapter 5: Case Studies on Failure Time Extraction
indicate some events that are not a function of the asset condition (e.g. “poor fuel”),
which provides evidence that these events were correctly classified.
Table 5-20. Randomly selected predicted downtime data for boilers Types of Maintenance Records
Failure Non-Failure BAGASSE SYSTEM STOP - BROKEN RECLAIMER CHAIN. POWER BLACKOUT CAUSED BY LOW
STEAM PRESSURE NO. 2 ID FAN HAD A BROKEN SPEED RING. POOR FUEL - LOW BACK PRESSURE REPAIRS TO NO. 1 RECLAIMER CHAIN. LOW STEAM PRESSURE REPAIRS TO NO. 3 BAGASSE BELT - MAGNET CLEAN. LOW ON STEAM REMOVED BROKEN ROLLER FROM NO. 2 BAGASSE BELT.
CHECKING LOOSE BOLTS
5.2.10 Cumulative Number of Failures before and after Text Mining
Figure 5-9 displays the cumulative number of failures before and after the text
mining approach. For comparison, the raw WO and DD cumulative number of events
were plotted (in green and blue, respectively). These can be considered to be naïve
estimates of the failure times. The text-mined failure times (Figure 5-9) were plotted
in brown and clearly indicate that the number of cumulative failures is less than the
raw number of DD events. This story is quite different compared to Case Study 1, as
one can observe that the cumulative number of events for the WO database and the
DD are much higher than the text-mined failure time estimates. Clearly, both the DD
and WO events appear to overestimate the failure intensity.
Figure 5-9. Cumulative number of failures for boilers in sugar processing industry
Chapter 5: Case Studies on Failure Time Extraction 101
5.3 SUMMARY AND DISCUSSION
As shown in Table 5-5, Table 5-6, Table 5-14, and Table 5-15 (considering
accuracy and F-Measures), it is clear that the SVM classifier performs better compared
to the NB one in maintenance data from coal mills and sugar boilers. Other than recall
metrics in the boiler case, the SVM outperforms the NB and shows better accuracy
and precision. In coal mill data, the Mixed-Gram based feature outperforms other
methods and indicates the best performance when it is used with the SVM text
classifier. On the other hand, the SVM gets the best accuracy and precision with the
Chi-square feature on sugar boiler data. In the case of the recall measure, the Mixed-
Gram based NB classifier gets the best value of all.
The validation method establishes that the text classifier performs well in DD to
determine the failure time information. On the coal mill data, the performance (88.87%
accuracy, 88.92% precision and 99.71% recall) implies that the classifier can
effectively identify the failure and non-failure maintenance events. On the boiler data,
the performance (83.86% accuracy, 74.5% precision and 45.88% recall) of the
classifier is still satisfactory.
It is evident from the table that the Mixed-Gram feature is effective to construct
a keyword dictionary and performs the best with the SVM classifier on coal mill data.
Such superior performance suggests that an order-of-words method applied with a
hyperplane based text classifier (i.e., SVM) performs best in data containing tens of
thousands of maintenance records. The SVM still performs well with a relatively small
amount of maintenance data (the sugar boiler data). In that case, the Chi-square or
Information Gain based feature gets better accuracy and precision compare to the N-
Gram method. This implies that features with the most informative keywords are more
suitable for the boiler case rather than the N-Gram method. This would likely hold
true in sets of maintenance data for other industries.
However, recall value in text classification is more critical due to the distribution
of positive and negative values. For boiler case, Table 5-14 and Table 5-15 indicate
that the SVM classifier has poor recall value (18.66%) as well as F-Measure (29.07%)
in spite of having larger accuracy (74.04%). Due to the large number of false negative
(FN) samples (see Table 5-18), the classifier shows such poor recall value. Thus, there
is a margin for improvement in recall, particularly in boiler. This thesis uses a semi-
102 Chapter 5: Case Studies on Failure Time Extraction
supervised approach in the next chapter while the text classifier is developed using
minimum number of expert labelled maintenance data. This new information
extraction methodology is expected to improve the performance of the classifier,
especially the recall value for boiler case.
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 103
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
Although WO data contains data tags that can be used to label them as relating
to failure or non-failure, a number of challenges might be considered here. These
include, among others, uncertainty in the dates, noisy labels and irregular recording
[3]. One way to overcome this is to label WO data (and/or DD) using expert
interpretation on free texts. However, in large data systems, expert interpretation is
expensive and requires a significant amount of time.
In this regard, many prefer the active learning method and construct text
classifiers using the minimum data possible. Compared to the standard machine
learning methods which use only labelled training data, active learning employs
unlabelled data along with a minimum amount of labelled data to train a classifier.
Ideally, active learning could also be applied to either WO’s or the DD to identify
failure time information by using experts to interpret the free texts. However, if expert
judgement is to be used, experts may need information from both databases to form a
reliable opinion about the maintenance in question. This study has thus adopted a
method that mitigates the cost of constructing a text classifier from a limited number
of expert-labelled samples from WO’s. The constructed classifier is then applied to
attribute each DD to failure or non-failure event.
This chapter starts with the motivation for using an advanced text mining
method. Shortcomings of automatic labelling are mentioned with reference to the
research problem, followed by a discussion of reasons to use a novel method to
mitigate such obstacles. After that, an innovative method is proposed for testing the
feasibility of the active learning concept as applied to maintenance data. The proposed
method is finally demonstrated in maintenance data sets from Australian power
generation and sugar processing industries. The outcome and results have been
presented at the end the chapter.
104 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
6.1 MOTIVATION
Text classification methods presented in Chapters 4 and 5 perform well in
analysing free texts and are able to determine failure events by classifying the
maintenance events into failure and non-failure classes. Such methods usually rely on
supervised machine learning algorithms which require that the training data are
labelled. Yet, the availability of labelled training data is problematic, particularly for
historical maintenance records over a considerable period where strict tagging
standards may have been put into place only recently (perhaps as a part of a new IT
system). Thus, it is often preferable to label maintenance data using expert assessment
of each entry, but such a time intensive process can be very laborious. Therefore,
training a classifier using the fewest manual labels as possible is proposed as a
necessary alternative.
One method to increase the efficiency of classifier training is through the use of
semi-supervised learning (SSL) methods which select training samples as a part of the
learning process [116-119]. Using SSL methods, a classifier can be constructed from
one maintenance database and then applied to a different (but related) database to
determine the failure events. As yet, no such SSL methods have been developed to
identify failure events in industrial maintenance databases.
In this chapter, an active learning-based text classification method is proposed
to identify failure time data (FTD) from multiple maintenance databases (WO and
DD), which represents an extension of the information extraction methodology
(proposed in Section 4.2) to incorporate the feedback of experts. The initial classifier
is constructed by manually labelling a small number of free texts of the maintenance
work descriptions from WO data using an SSL approach. New informative samples
are identified using the current classifier and these are added to the initial training data.
To identify failure events, the trained classifier is applied to the free text of DD to
associate each downtime event with a failure (or non-failure) time. The developed
method is tested on two real case studies in Australia: one in power generation and one
in the sugar industry.
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 105
6.2 METHODOLOGY
This research seeks to develop a method to identify FTD by linking typically
available maintenance data using text mining and expert feedback and opinion. In
particular, text mining is used to link two databases (WO and DD) containing different
components of data essential to identifying FTD, while expert feedback is used to train
the text classifier. Due to the expense of labelling, an efficient method is developed
for requesting expert opinion based on the text examples that will likely be most
informative to the classifier (i.e. active learning) (see Figure 6-1).
Figure 6-1. Active learning techniques used in the methodology
The overall methodology proposed in this research is illustrated in Figure 6-2.
Since WOs are more plentiful and it is typically easier for experts to establish if a WO
is a failure, the WOs will be used to train a text classifier, which will be applied to the
text field of the DD. First, the free text descriptions from both WOs and the DD are
pre-processed, which includes text cleaning processes. Subsequently, a base classifier
is trained on a small number of initial WO labelled data ℒ by an expert (Split 1) which
are then used to label the remaining WO unlabelled training data 𝒰𝒰 from the pool
(Split 2). From the newly classified unlabelled data, the most uncertain samples are
selected for expert labelling (i.e. an uncertainty sampling strategy is pursued). The
selected samples are then added to ℒ for the next learning cycle. In the next cycle, the
model is retrained on the augmented ℒ, and this process of uncertainty sampling and
re-training is repeated until a termination condition is satisfied. Finally, the improved
classifier is subsequently applied to the free text of the DD to label each stoppage as
106 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
either a failure or non-failure. The details of the each step will be discussed in the
following subsections.
Figure 6-2. Uncertainty-based Active Learning text classifier
6.2.1 Text Cleaning and Initial Training Data Formulation
The free texts used in maintenance databases contain a large proportion of non-
informative text that have been cleaned by removing unwanted spaces, numbers,
punctuation and non-discriminating words (i.e. stop words). A series of typical text
cleaning methods has been used (as mentioned in Section 2.10). The raw texts are then
represented by simple features known as a bag of words. This representation ignores
the order in which the terms appear, providing only a variable indicating whether the
term appears or not. This can be done by splitting the cleaned text data into individual
words, which is called tokenization. The text classifier generally requires text data in
the form of a matrix, where each row contains a maintenance record of text data and
each column presents a keyword.
The pre-processed WOs were divided into training and testing data. The testing
data are only used to evaluate the performance of the classifier at the end of the
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 107
experiment and are not be utilized for any other purpose. The training data are further
split into two groups: ℒ and 𝒰𝒰 (as shown in Figure 6-2). To construct the initial
classifier, a small percentage of training data is randomly chosen, labelled by the
expert, and placed into ℒ. After selecting the initial labelled data to train the initial
SVM classifier, then the remainders are taken as the unlabelled training data in 𝒰𝒰.
Suppose the initial labelled training data 𝒰𝒰 contains d-dimensional feature vectors
𝐱𝐱i ∈ ℝ𝑑𝑑 with class labels 𝑦𝑦𝑖𝑖 ∈ {−1 (non − failure), 1 (failure)}. Each 𝐱𝐱i represents
an item of text containing a maintenance work description. A binary classifier 𝑡𝑡(𝐱𝐱)
can be used to predict the label (failure or non-failure) of each description 𝐱𝐱i:
𝑦𝑦� = �+1 𝑡𝑡(𝐱𝐱i) ≥ 0−1 𝑡𝑡(𝐱𝐱i) < 0 6-1
In this work, support vector machines (SVMs) will be utilized to describe the
discriminant function 𝑡𝑡(𝐱𝐱i) (using Eq. 2-37) and the well-known RBF kernel
(following Eq. 2-41). The values of 𝐰𝐰 and 𝑏𝑏 are determined by maximizing the
classification margin (e.g. Eq. 2-38) in the high-dimensional feature space for a given
misclassification penalty 𝐶𝐶 and RBF width parameter 𝛾𝛾 . The solution to this
optimisation problem yields the resulting classifier formulated in Eq. 2-40. Here 𝛼𝛼𝑖𝑖 𝑖𝑖 =
1, 2, … .𝑁𝑁 are Lagrange multipliers from the optimisation problem and are only non-
zero for a relatively small subset of the data have 𝛼𝛼𝑖𝑖 ≠ 0 (i.e. “support vectors”) [93].
The parameters 𝐶𝐶 and 𝛾𝛾 are typically determined through cross-validation. The
learned decision function can then be used to predict the class label of a given text
description 𝐱𝐱i.
6.2.2 Active Learning via Uncertainty Sampling
As shown in Figure 6-2, by training on ℒ, the unlabelled training data 𝒰𝒰 are
labelled into two classes: failure and non-failure by the initial classifier:
𝑌𝑌𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝒰𝒰 and 𝑡𝑡(𝐱𝐱i) ≥ 0} 6-2
𝑌𝑌𝑛𝑛𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝒰𝒰 and 𝑡𝑡(𝐱𝐱i) < 0} 6-3
108 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
During each iteration of the AL process, a classifier is trained on the labelled
samples in ℒ and it queries the unlabelled training data in 𝒰𝒰 that is closest to the
current classification boundary:
𝑋𝑋𝑎𝑎𝑎𝑎 = arg min𝐱𝐱i∈𝒰𝒰
𝑡𝑡(𝐱𝐱i) 6-4
where 𝑡𝑡(𝐱𝐱i) represents the distance of 𝐱𝐱i to the classification boundary. Such
queried samples are the most uncertain ones (due to their proximity to the decision
boundary [119]) and are thus selected for expert labelling and added to ℒ, i.e. ℒ ← ℒ ∪
𝒳𝒳𝑎𝑎𝑎𝑎. The classifier is subsequently re-trained using ℒ and the process is repeated until
a termination condition is satisfied (e.g. number of iterations, minimum
accuracy/recall/precision).
The trained classifier is finally used to separate the 𝐷𝐷𝐷𝐷 into failure and non-
failure classes,
𝑍𝑍𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝑆𝑆𝑇𝑇 and 𝑡𝑡(𝐱𝐱i) ≥ 0} 6-5
𝑍𝑍𝑛𝑛𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝑆𝑆𝑇𝑇 and 𝑡𝑡(𝐱𝐱i) < 0} 6-6
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 109
The algorithm is summarized in Figure 6-3.
Figure 6-3. The Uncertainty-based AL algorithm
6.3 CASE STUDIES
The above methodology will be demonstrated on two case studies discussed in
Chapter 5. In both, there are two distinct databases: work order/notifications (WO) and
downtime data (DD). As mentioned earlier, WOs and DD are incomplete in the manner
that one requires both to identify failure time.
Table 6-1 shows the randomly selected data entries from WO for the two case
studies. The table shows that WOs contain information useful for the expert
interpretation of the maintenance event (e.g. maintenance type, work priority).
However, when examining the WO text descriptions, one can see that the maintenance
work descriptions are not straightforward to interpret and there are few “obvious”
110 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
descriptions of failure events to the non-expert. In most cases, work descriptions
mislead with respect to the maintenance type and priority to determine the “failure”
events. For example, there is a clear absence of unambiguous failure terms in the text
descriptions for both the 2nd and 9th entries in Table 6-1 (a), even though such WO tags
present with defect and high priority maintenance work.
Table 6-1. A few randomly selected data entries from WO for (a) coal mills and (b) boilers (a) Coal Mills
Entry No. Maintenance Type
Work Priority Work Description
1 Defect Immediate Had trips to manual 2 Defect Urgent bottom pyrities door local hydraulic plu
3 Preventive Maintenance Planned Weld overlay of mill table
4 Modification Planned XD Bunker Trial Nozzle 5 Defect Immediate Xf mill pyrites sluiceway is blocking wi
6 Preventive Maintenance Planned thermocouple pocket
7 Defect Immediate Missing switchboard label 8 Non-Maintenance Planned Cantilever PF Spades 9 Defect Urgent YD Mill hot air gate.
10 Preventive Maintenance Planned Mill Air Blasters
(b) Boilers
Entry No. Maintenance Type Work Priority Work Description
1 Corrective Maintenance Urgent lights on no3 boiler
2 Inspection Planned & Scheduled
Replace V-Belts on Bag.Conveyor Drives
3 Corrective Maintenance Immediate Collect oil samples from G-boxs
4 Corrective Maintenance Urgent Modifications to FD Fan inlet
5 Non-Maintenance Planned & Scheduled Bagasse Bin- Fab. 2 Sets Sprockets
6 Corrective Maintenance Urgent Boiler 3 ID fan - clean
7 Non-Maintenance Planned & Scheduled
Cyclone Repairs Replace 5 bagasse tarps
8 Corrective Maintenance
Planned & Scheduled
Replace no3 lift pump suction valve
9 Urgent Maintenance Immediate Clean around bagasse conveyor 3
10 Inspection Planned & Scheduled Remove feeder chains - Boiler 2
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 111
On the other hand, the maintenance type and work priority of the 7th entry might
suggest that it is a “failure”, but the description clearly indicates an issue that has little
to do with the operation of the asset (“missing switchboard label”). Thus, the
descriptions and tags need to be interpreted together by an expert to reliably identify
failure events in the work orders.
On the other hand, in DD entries as seen in Table 5-1 (b) for coal mills and
Table 5-11 (b) for boilers, “failure” is even harder to recognize by inspecting work
descriptions, particularly without any tags denoting the urgency or type of event. Thus,
both databases contain useful information for failure events and should be interpreted
together to ascertain failure times of the asset.
After that, the text descriptions of WOs data and the DD from both case studies
were cleaned according to well-known techniques. The detailed cleaning process and
outcomes have been discussed in Section 5.1.2 for coal mills and Section 5.2.2 for
boilers.
6.3.1 Classifier Formulation and Benchmark Algorithm
The cleaned WOs are divided into two: 80% training and 20% testing. An initial
classifier is constructed using 5% of expert labelled ℒ, chosen randomly from the
training WOs. This initial classifier is then constructed on labelled ℒ. The classifier is
eventually applied on unlabelled 𝒰𝒰 and the procedure is repeated as mentioned in
Section 6.2.2.
The performance of the proposed AL-based text classifier is compared with three
other models:
• The first model is the standard SVM classifier, which is treated as a
baseline classifier denoted as SVM100% expert labelled. For this model, WOs
have been considered as the training data to train an SVM classifier
where all the data samples (ℒ and 𝒰𝒰) are manually labelled.
• The second model is the hybrid model combining AL and SSL as
proposed by Leng and colleagues [119], which is denoted as AL-SSST.
While the standard machine learning methods use only labelled training
data, this model employs unlabelled data along with some labelled data
for training classifiers with improved accuracy [120]. This model not
112 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
only selects the most reliable samples but also exploits the most
informative ones with human annotations to improve text classifier
performance. Logically, such reliable samples are likely to be in the non-
queried unlabelled data (i.e., not queried by the classifier in the active
learning process):
𝒰𝒰𝑛𝑛𝑛𝑛 = 𝒰𝒰\{𝑋𝑋𝑎𝑎𝑎𝑎} 6-7
During the process of selecting reliable samples, the most certain samples have
been chosen from 𝒰𝒰𝑛𝑛𝑛𝑛 at the furthest distance from the classification boundary:
𝑋𝑋𝑠𝑠𝑡𝑡 = arg max𝐱𝐱i∈𝒰𝒰𝑛𝑛𝑛𝑛
𝑡𝑡(𝐱𝐱i) 6-8
Thus, AL-SSST queries both the samples 𝒳𝒳𝑎𝑎𝑎𝑎 and 𝒳𝒳𝑠𝑠𝑡𝑡 and adds to the ℒ, i.e.
ℒ ← ℒ ∪𝒳𝒳𝑎𝑎𝑎𝑎 ∪ 𝒳𝒳𝑠𝑠𝑡𝑡 in each training cycle, until a termination condition is satisfied.
• The last classifier, denoted as SSST, primarily trains on ℒ, queries the
samples 𝒳𝒳𝑠𝑠𝑡𝑡 furthest from the classification boundary, adds to the ℒ, i.e.
ℒ ← ℒ ∪ 𝒳𝒳𝑠𝑠𝑡𝑡 and repeats until a termination condition is met.
To comprehensively evaluate the performance of different models mentioned
above, the classification accuracy has been chosen as the evaluation criterion [3],
Accuracy =
the number of correctly classified datathe number of all testing data
6-9
6.3.2 Accuracy of the Text Classifier
In this analysis, Rstudio1 is used to analyse the free texts from WOs in Case
Study 1 and Case Study 2 as well to train and test the SVM-based classifiers. After
constructing different classifiers based on the models mentioned in Section 6.3.1, their
accuracies are tested by comparing predicted values of failure and non-failure WOs
with actual ones, details of which are stored in testing the data. Although active
learning and semi-supervised learning methods are existing approaches in the literature
1 RStudio: Integrated Development for RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 113
and have been tested in different application domains, in industrial applications such
methods are relatively new to determine a failure time event from maintenance
databases.
In this study, the proposed model (i.e., the AL based text classifier) is constructed
using the training data from the WO maintenance databases in each case study.
Moreover, the accuracy of the proposed model is compared with three other common
models: SVM100% expert labelled, AL-SSST and SSST (used in text classification methods).
Figure 6-4 shows that the classification accuracies for the number of labelled data
increase when performing different SSL models on the training data from the two case
studies.
On coal mill data, AL outperforms AL-SSST and SSST and gets the same
accuracy with SVM100% expert labelled when the percentage of labelled data is equal or
above 35%. In principle, SSST based models query 𝒳𝒳st (i.e., what is selected by the
initial classifier itself) and use their labels accordingly. The result of poor accuracies
for the models constructed by SSST methods implies that the initial classifier itself is
not good enough to automatically categorise 𝒰𝒰 without using expert assessment.
(a) Coal mills in power generation
114 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
(b) Boilers in sugar processing industry
Figure 6-4. Classification accuracies of each model over the percentage of labelled data increase
On boiler data, AL is also superior to AL-SSST and SSST, achieving comparable
accuracy to SVM100% expert labelled when the percentage of labelled data is between 55%
and 80% and achieving a similar accuracy to SVM100% expert labelled when the percentage
of labelled data is above 80%. Moreover, AL-SSST manages to reach comparable
accuracy with AL when the percentage of labelled data is above 80%. This implies
that the classifier constructed from 𝒳𝒳𝑠𝑠𝑡𝑡 and their labels are also effective after the
majority of the data is labelled. The performance measures of active learning based
text classifiers for both the case studies have been shown in Appendix E.
Another analysis has been conducted to identify the maximum accuracy
achieved by each model and the corresponding total number of training samples
required to achieve this accuracy (see Table 6-2). The models are based on both AL
(data labelled by the expert) and SSST (data labelled by the classifier) methods.
Column 4 in Table 6-2 represents such percentages of labelled data.
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 115
Table 6-2. Classification accuracies of different models over two case studies
Case Study Models Total No. of Data
Data Labelled (%)
Maximum Accuracy Achieved
(%) By the
classifier By the expert
Power Generation
Company (Coal Mills)
𝐴𝐴𝐿𝐿 964 - 39.5 95.56 𝑆𝑆𝑉𝑉𝑆𝑆100% expert labelled 2439 - 100 95.21
𝐴𝐴𝐿𝐿 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 2375 67.61 32.39 88.56 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 1154 100 - 78.30
Sugar Processing Industry (Boilers)
𝐴𝐴𝐿𝐿 974 - 54.08 78.72 𝑆𝑆𝑉𝑉𝑆𝑆100% expert labelled 1801 - 100 80.43
𝐴𝐴𝐿𝐿 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 1658 65.56 34.44 77.02 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 567 100 - 70.44
Numbers in bold: maximum accuracy achieved and corresponding numbers of data required to
achieve the maximum accuracy
On coal mill data, AL achieves the highest accuracy (95.56%) using only 39.5%
of labelled data, which is still higher than 95.21% if we used 100% labelled data using
the SVM100% expert labelled method. Unlike manually labelling all the data, one may use
AL-SSST to achieve a maximum accuracy of 88.56% by using the labelling ratio of
67.61%: 32.39% of classifier to expert, respectively. Table 6-2 also shows that a
maximum accuracy of 78.30% can be achieved by the classifier SSST by using 100%
of the labelled data by the classifier only.
In the case of boiler data, accuracies achieved by the three models (AL, AL-
SSST and SSST) are comparatively close to each other. However, the accuracy
obtained from the SVM100% expert labelled method (80.43%) is higher than the other three
models: AL (78.72%), AL-SSST (77.02%) and SSST (70.44%). Similar to Case Study
1, AL achieves a maximum accuracy of 78.72% by using around 54% of labelled data.
It is important to mention that the accuracy of AL converges toward and is equal to
SVM100% expert labelled where 100% data are labelled [119].
In sum, to test the feasibility of active learning using expert assessment, all the
WO data have been labelled manually and the feasibility of AL and other SSST
methods has been tested. Their accuracies were compared with the 100% expert
labelled data. In the actual scenario, it is impossible to manually label all the data. The
result implies that AL can achieve the maximum accuracy using only 40% of data from
coal mills and 54% of data from boilers being labelled by the expert. Although AL-
SSST requires less manually labelled data than AL, the saving is very limited
compared to the sacrifice in accuracy.
116 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
6.3.3 Validation of the Classifier
To validate the accuracy of the text classifier constructed from WOs, the
predicted labels of the DD have been compared with the estimated “actual” ones,
discussed in Section 5.1.7 for coal mills and in Section 5.2.7 for boilers.
Table 6-3 shows the accuracies of AL classifiers (using different percentages of
labelled WO data) which are applied on the DD. At the beginning, AL classifiers are
constructed over various percentages of labelled WO data (see Column 2 in Table 6-3).
Each of the classifiers is then applied to the DD to label them as failure and non-failure.
Predicted labels are finally compared with the estimated actual ones and their
accuracies have been shown in Column 3 of Table 6-3.
For coal mills, the accuracies of different classifiers are quite similar to each
other. When trialling different percentages of labelled WO data from 10% to 100%,
the accuracies range from 87.97% to 88.55%. However, the classifier (using 40%
labelled WO data) shows an accuracy (88.15%) very close to the accuracy (88.55%)
of the classifier using 100% labelled WO data. This implies that one may manually
label only 40% of WO data and achieve almost similar outcomes compared to
constructing the classifier using 100% labelled WOs.
Table 6-3. Accuracies of the AL-based classifiers (WO trained classifiers) on DD
% of
Labelled WO
Accuracy on DD (%)
Power Generation Company
(Coal Mills)
10 87.64 20 87.97 30 87.77 40 88.15 50 88.20
100 88.55
Sugar Processing Industry (Boilers)
30 60.84 35 68.09 40 68.00 45 80.61 50 80.42 55 83.40
100 85.58
For the boiler case, the classifier constructed using 30% labelled WO data shows
poor accuracy (60.84%) compared to 85.58% accuracy at 100% labelled WO data.
However, the accuracy starts to increase significantly if 45% of the WO data is being
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 117
labelled. At 55% of labelled WO data, the accuracy (83.40%) is comparable and very
close to the maximum accuracy achieved at 100% labelled WO data. Like the previous
case, one may manually label only 55% of WO data and achieve almost similar
outcomes compared to constructing the classifier using 100% labelled WOs.
It is clear from Table 6-3 that one may require a minimum number of labelled
WO data items to construct the text classifier. The detailed performance measures for
both the case studies have been outlined in Appendix F. The proposed method requires
less manually labelled data with a limited sacrifice in accuracy.
6.3.4 Failure Time Identification Using DD
According to the outcomes from Section 6.3.4, the AL_40% (using 40% labelled
WO data) and AL_55% (using 55% labelled WO data) classifiers are applied to the
DD to label each of them as failure or non-failure for coal mills and boilers,
respectively. Table 6-4 and Table 6-5 present the outcomes of the predictions. The
tables clearly indicate that the AL based text mining approach categorises a significant
number of DD items as non-failure, particularly in the case of the boiler, where the
vast majority of the maintenance actions appear to have text descriptions that indicate
non-failure actions.
Table 6-4. Predicted instances of failure and non-failure downtimes (coal mills) Unit X Unit Y Total
Instances Mill Mill A B C D E F A B C D E F
Failure 103 130 135 97 141 100 111 99 140 134 146 107 1443 Non-failure 10 11 8 13 8 6 2 10 14 10 11 8 111
Total Instances 113 141 143 110 149 106 113 109 154 144 157 115 1554
Table 6-5. Predicted instances of failure and non-failure downtimes (boilers) Functional Locations ASH BD BAG STM TA BF OT Total Instances
Failure 04 28 176 05 02 06 04 225 Non-failure 15 380 240 23 66 11 87 822
Total Instances 19 408 416 28 68 17 91 1047 ASH: Ash; BD: Boiler Body; BAG: Bagasse; STM: Steam; TA: Turbo Alternator; BF: Boiler Feed Water; OT: Other
118 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
6.4 BENEFITS OF INCLUDING EXPERT LABELLING
This study uses the most informative samples from WO’s to construct the text
classifier. In each training cycle, the informative examples are being labelled by the
expert and added to the classifier. Eventually, the classifier is updated using such
iterative process. To evaluate the benefit of expert labelled classifier, this study
constructs a mixed classifier using both expert and automatic labelled WO’s.
Automatic labelling uses the existing data fields in WO’s without any expert
assessment. Based on our working definition of failure, if the WO is unplanned (e.g.
“defect”) and urgent (i.e. “high priority”) the work order is considered to describe a
potential failure event. Thus, the free texts will likely use the keywords that the
organisation would use to describe a failure. Using the urgency and the source of the
maintenance request, all the data samples (ℒ and 𝒰𝒰) are labelled automatically.
To compare the performance, the first classifier is constructed using automatic
labelling of all the data samples from WO’s where no expert labelled sample is used.
Eventually, the next classifier is formulated using 20% of expert labelled data while
the rest of them are automatically labelled. Following similar procedure, the
percentage of expert labelled data is increased by 20% with similar reduction in the
percentage of automatic labelled data. The last classifier is constructed using all the
expert labelled data where no automatic labelled sample is used.
Table 6-6. Performance of the mixed classifier over percentages of automatic and expert labelled (uncertainty-based) WO
Case Studies
Mixed Classifier Performance on Expert Labelled Test set (%)
% of WO using
automatic labelling
% of WO using expert labelling
(uncertainty-based)
Accuracy Precision Recall
Power Generation
Company (Coal Mills)
100 0 94.72 94.57 95.00 80 20 94.89 94.95 95.00 60 40 95.18 96.28 94.10 40 60 95.39 96.39 94.02 20 80 95.41 96.73 94.10 0 100 95.41 96.73 94.10
Sugar Processing Industry (Boilers)
100 0 69.57 55.17 10.96 80 20 70.85 61.54 16.44 60 40 73.62 72.00 24.66 40 60 75.11 71.01 33.56 20 80 76.81 73.42 39.73 0 100 78.30 75.58 44.52
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 119
Uncertainty-based query selection strategy has been followed to construct the
expert labelled classifier. Table 6-6 shows the performances of the mixed classifiers
for both the case studies. On coal mill data, the performance improvement of using
expert labelled classifier over the automatic labelled one is marginal. Although the
expert labelled classifier achieves better accuracy and precision over the automatic
labelled classifier, the recall value decreases. On boiler data, the improvement is
significant. The expert labelled classifier shows a consistent improvement on accuracy,
precision and recall over the automatic labelled classifier. It is worthwhile to mention
that, the classifier from 100% of expert labelled data achieves improved performance
(12.55% in accuracy, 37% in precision and 300% in recall) over the 100% of automatic
labelled classifier.
To conduct a different experiment, another mixed classifier is constructed using
different percentages of automatic and expert labelled WO. In this case, expert labelled
samples has been chosen randomly instead of uncertainty based query selection
strategy. The performance of the mixed classifier is presented in Table 6-7.
Table 6-7. Performance of the mixed classifier over percentages of automatic and expert labelled (randomly selected) WO
Case Studies
Mixed Classifier Performance on Expert Labelled Test set (%)
% of WO using
automatic labelling
% of WO using expert labelling
(randomly selected)
Accuracy Precision Recall
Power Generation
Company (Coal Mills)
100 0 94.72 94.57 95.00 80 20 94.83 94.91 94.39 60 40 94.49 94.14 95.00 40 60 94.95 95.18 94.89 20 80 94.97 95.39 94.77 0 100 95.11 95.77 94.80
Sugar Processing
Industry (Boilers)
100 0 69.57 55.17 10.96 80 20 70.42 68.00 22.13 60 40 71.32 70.85 22.89 40 60 73.55 70.55 28.13 20 80 74.99 71.66 31.00 0 100 77.93 73.39 37.71
Performance of uncertainty-based expert labelled classifier is superior to
randomly selected expert labelled classifier (comparing Table 6-6 and Table 6-7).
120 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
(a) Coal mills in power generation
(b) Boilers in sugar processing industry
Figure 6-5. Classification accuracies of each mixed-classifier over the percentages of automatic and expert labelled data
Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 121
Selecting expert labelled samples through uncertainty-based strategy (proposed
in active learning) clearly shows steep increase in accuracy over randomly selected
expert label (see Figure 6-5). The active learning based method achieves significant
level of accuracy using only 40% and 60% of expert labelled data for coal mills and
boilers respectively.
6.5 SUMMARY
To assess the importance of unlabelled data in situations where the labelled data
are rare and costly, the accuracies of AL, AL-SSST, SSST and SVM100% automatic labelled
have been evaluated by varying the number of labelled data items from 5% to 100%.
AL based text classifier achieves similar accuracy to SVM100% expert labelled when the
percentage of labelled data is above 35% for coal mills and is above 80% for boilers.
Figure 6-4 implies that AL outperforms AL-SSST and SSST for both the cases. When
the number of labelled data items is small, i.e. 39.5% for coal mills and 54.08% for
boilers as shown in Table 6-2, the classification accuracy of AL is comparable with
SVM100% automatic labelled and superior to AL-SSST and SSST.
Before identifying the FTD from the DD, the performance of the AL classifier
is validated by comparing the predicted labels of AL classifiers (which are constructed
from different percentages of labelled WO data) on the DD with the estimated “actual”
ones. Table 6-3 displays that one may manually label 40% and 55% of WO data for
coal mills and boilers, respectively, and still achieve comparable accuracy to if 100%
of WO data were manually labelled. Finally, the AL_40% and AL_55% classifiers for
coal mills and boilers respectively are applied to the DD to identify the historical FTD.
Table 6-6 suggests significant benefits of expert labelled text classifier over
automatic labelled. Although the performance of the expert labelled text classifier is
not improving on coal mill data, such text classifier performs well on boiler data and
shows significant performance improvement (12.55% in accuracy, 37% in precision
and 300% in recall). Instead of randomly selected expert labelled data samples, the
proposed active learning method (using uncertainty sampling) achieves significant
accuracy using small percentage of labelled data (as shown in Figure 6-5).
In boiler data, the performance of the expert labelled text classifier is
significantly improved over automatic labelled text classifier. Especially, the expert
labelled classifier provides improved recall value (44.52%) over automatic labelled
122 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning
classifier (10.96%) and thus mitigates the difficulty of predicting larger false negative
values mentioned in Section 5.3.
123
Chapter 7: Conclusion and Future Research Directions
7.1 CONCLUSION
In real practice, an essential input to reliability and maintenance optimisation
analysis is historical failure times or times of failure of an asset. However, it is not
always readily available in industry maintenance databases due to incorrect recording
and historically poor data management [15]. In other words, there is a gap between the
information required for reliability and optimization modelling and the actual data
typically available in industry databases. This study was attempts to bridge this
significant gap by identifying information requirement specifications for reliability
and maintenance optimization models as well as developing novel methods to analyse
the typically collected maintenance data (i.e., data available in both numerical and text
formats) so as to meet the requirement of models. In addition to analysing the
maintenance data, this study developed a new information extraction methodology to
increase the accuracy of estimating historical failure times for reliability analysis,
using real world maintenance data. The key idea of such methodology was the use of
work orders (WOs) and downtime data (DD) to jointly determine when an asset has
“failed” historically.
In the first, the WOs were automatically labelled using data fields and these were
linked with the DD using the free texts across both data systems. To overcome the
shortcoming of automatic labelling (due to unreliable data field), a modified method
considering semi-supervised text mining method was proposed. Such a method utilised
the active learning concept to minimise the number of data items to be labelled by an
expert. These methods were demonstrated on two real world case studies and the
results showed that the methods were promising.
The main contributions of the work of this thesis are summarised as the
following:
• Information Requirement Specification. Reliability and maintenance
optimisation models often require accurate “failure times” of an asset and
124
many times these models are fairly complex. Due to a lack of the required
information, using such models often lead to poor reliability estimation
and inaccurate maintenance decisions. Moreover, the existing literature
has rarely investigated the link between information requirements and
the data typically available in the asset and maintenance data across
multiple databases. This research investigated the existing literature to
identify the information necessary for the models as well as the
availability of the information in existing maintenance databases. This
study proposed a framework and established information requirement
specifications based on literature for reliability and maintenance
optimization models. The requirement specifications provide a guideline
to record asset and maintenance data correctly (Chapter 3).
• Failure and Non-Failure Maintenance Times Identification. Asset
intensive organisations have different databases each may contain parts
of the information needed to define a failure event. Some common
databases (e.g. WOs) contain the record of every maintenance activity
(i.e., a repair, check or routine inspection) performed on an asset.
However, these databases rarely contain information on the motivation
for the activity, (i.e. was the issue raised to fix a “failure” or due to
planned maintenance and did this event cause any downtime?) and
impact of the activity to the system. Conversely, some DD systems
contain detailed information on when the asset is not operating, but lacks
an unambiguous indication of the reason for the stoppage. Thus, it is
often the case that individual database does not possess the necessary and
complete information to identify a “failure” event, where one needs to
know both when the asset is down and if this downtime was unplanned.
This thesis developed a novel method to identify failure and non-failure
maintenance times using multiple maintenance databases: WO and DD.
Using the urgent and high priority maintenance work descriptions in
WOs, a text classifier was constructed and applied to assign each DD
event into one of two classes: failure and non-failure. The proposed
method thus identified DD events whose work descriptions are consistent
with urgent and high priority WOs. Validation of the text classifier and
125
the analysis of the identified failure events confirmed the accurate
identification of failures in the DD. Using the novel method, reliability
and optimization models can be applied to existing industrial settings.
• Improvement in Classification Accuracy: This thesis tested the failure
and non-failure maintenance time methodology and the advanced text
classification methodology on maintenance data from two real world
industrial case studies. Certain methodologies proved to be effective to
accurately identify the failure and non-failure time. Although SVM based
text classifiers were found to outperform NB based classifiers for both
coal mill and boiler cases, the performance was not superlative
(maximum accuracy 74.04% and precision 60%) or even worse
(maximum recall 26.87%) in boilers. In this regard, the active learning
based text classifier showed significant improvement in classification
performance, especially for boilers. By querying the most informative
samples from WOs which, further, were labelled by an expert, this thesis
constructed a text classifier. The performance improvement were
significant in terms of recall (66%), accuracy (6%) and precision (26%)
in boilers. The improved recall allowed the identification of the correct
labels of the text classifier and thus reduced the larger false negative
value (predicted by the text classifier).
• Advanced Information Extraction Methodology. Training of
classifiers to recognize failure/non-failure descriptions requires a set of
labelled data which is often provided by an expert. A key challenge is
thus the “expense” of labelling; an expert must assess each text
description individually and thus labelling all of the data is unfeasible.
Furthermore, if expert judgement is to be used, experts may need
information from both the WOs and DD to form a reliable opinion about
the maintenance in question. Thus, it is important that WOs and the DD
are interpreted jointly, by the expert. This thesis thus developed an
advanced method to identify historical failure times by linking the WOs
and DD using both text mining and expert judgement. A text classifier
was constructed using expert judgement (i.e. the active learning concept
allowed maintenance data which are most informative to the classifier to
126
be labelled manually in the least possible amount). Thus, active learning
played a crucial role to mitigate the cost of constructing a text classifier
from a limited number of expert-labelled samples. The constructed
classifier was eventually applied to label each DD item into failure and
non-failure, thus giving each item one or other of these attributes. Results
from the case study demonstration imply that active learning can
decrease the number of labelled samples by approximately 50% while
achieving the same classification accuracy. The outcomes of this study
can be used to develop statistical models of failure times from historical
maintenance databases and maintenance records where the only
consistently available data is a free text description.
In chapter 1, a summary of this thesis was provided. In Chapter 2, previous
studies on reliability and maintenance optimization models, data recording techniques
and text mining were broadly discussed. A potential gap between the information
required for reliability and optimization models and the actual data available in
maintenance databases was identified and described in detail. In Chapter 3, reliability
and maintenance optimization models were investigated. An information requirement
framework was constructed for those models as well as for identifying the required
information in existing maintenance databases.
In Chapter 4, a novel information extraction methodology was proposed. The
proposed method identifies the “failure” events in historical maintenance databases
whose text descriptions are consistent with the definition of failure. In Chapter 5, the
proposed information extraction methodology was demonstrated on two real world
case studies, with results showing that the methodology is promising. Performance
validation of the text classifiers (Table 5-8 and Table 5-18), randomly selected text
mined maintenance data (shown in Table 5-10 and Table 5-20) and word clouds Figure
5-4 and Figure 5-8 seem to largely validate that the proposed approach is capable of
identifying failure and non-failure text descriptions well.
Chapter 6 presented a novel approach to extract failure time information using a
minimum number of maintenance data items labelled by an expert. An active learning
based text classifier was constructed using work orders (WOs) and this was
subsequently applied to downtime data (DD) to jointly determine when an asset has
“failed”. Like Chapter 5, the method was demonstrated on two real world case studies,
127
with results showing that the methodology is effective in such situations. Figure 6-4
indicates that active learning based text classifiers have superior accuracy over other
semi-supervised methods. Table 6-2 suggests that one can achieve a maximum
accuracy of text classifiers by manually labelling around 50% of the data. The findings
from the validation of the text classifier (Table 6-3) and benefits of expert labelling
classifier (Table 6-6 and Table 6-7) reveal that the active learning concept can be
applied to maintenance data after an expert manually labels a minimum number of
unlabelled data items. Using such minimal efforts from an expert, the method can
effectively identify failure time information from a large set of historical maintenance
databases.
7.2 FUTURE RESEARCH
Through the approach discussed in this thesis, great progress and several
contributions have been made; however, there are still some deficiencies that need to
be improved. Based on the situation, future research directions are outlined as follows:
• Future research might be focused on developing methods to recognise
new vocabulary and update the text classifier to incorporate the new
keywords which may arise from different personnel filling out the
maintenance logs. Although a data fusion approach has been proposed in
this study (a text mining method to link the WOs and the DD), the
classifier could be constructed from both the WOs and the DD. Such a
classifier might be more effective to label future data entries from the
WOs or the DD.
• This thesis constructed the active learning based text classifier using a
minimum number of expert-labelled WOs. To overcome shortcomings
due to variations in maintenance logs being filled by different
individuals, the classifier might be constructed using both the WOs and
the DD. However, due to the lack of additional information other than
work descriptions and work entry dates, an expert might find difficulty
in labelling the DD. In this regard, the expert could utilise other available
information (e.g. cause codes and cause descriptions) or, another related
maintenance database (e.g. the plan to work, generally denoted as PTW).
128 Appendices
• Regarding active learning query selection strategy, this thesis used the
most general strategy, on a data sample which had the least confidence
in label prediction. However, it did not consider the remaining label
distribution, which could be mitigated by using margin sampling or an
entropy measure. Future work could be directed towards testing more
sophisticated query strategies i.e., query by committee (QBC), expected
model change and error reduction. Furthermore, the query strategies
could also be employed with probabilistic classification models (i.e.,
naïve Bayes) or, non-probabilistic ones (i.e., k-nearest neighbour).
• Selecting the best features is a crucial part of a text classification model.
Features that are more sophisticated will be explored (e.g. via frequent
and sequential pattern mining) with the goal of improving the
classification accuracy. Such classifiers will be constructed using
complex maintenance data (i.e., features consisting of frequent and
sequential patterns) to extract failure time information more precisely.
Another effective way of constructing text classifier is to train multiple
levels of text features using deep learning algorithms i.e. deep neural
networks (DNN), convolutional neural network (CNN), and recurrent
neural network (RNN). The constructed network would be consisted of
input, output and hidden layers and able to handle text documents with
high dimensional features. In future, different features (e.g. TF-IDF, CS,
IG and N-Gram) may be utilised and incorporated into the expert labelled
classifier (e.g. the active learning based text classifier). Thus the
extracted best feature could be used in deep learning or expert-based text
classifier to improve the classification performance.
Appendices 129
Appendices
Appendix A
Keyword Dictionaries Constructed from Different Text Features for Case Study
1 (Coal Mills)
A portion of keyword dictionary (𝜒𝜒21608) using top 35 CS features Record No. Keywords [1-7] "air" "fan" "seal" "leak" "chang" "filter" "blast" [8-14] "unit" "tapping" "point" "pocket" "bunker" "pyrites" "thermocouple" [15-21] "flow" "greas" "filters" "gbox" "top" "bottom" "micron" [22-28] "inspect" "lube" "limit" "please" "gate" "repair" "replace" [29-35] "feeder" "outlet" "open" "routin" "cold" "faulti" "temp"
A portion of keyword dictionary (𝑁𝑁𝑁𝑁1,2,3)) using Mixed-Gram (mixed of Uni, Bi and Tri-Gram) features
Record No. Keywords [100-104] "coal feed" "coal feeder" "coal flow" "coal leak" "cold" [105-109] "cold air" "cold air damp" "cold air damper" "cold air g" "cold air gate" [110-114] "coming" "comp" "computer" "computer point" "continually" [115-119] "control" "control damper" "control valv" "conveyor" "cooler" [120-124] "cooler bypass" "corner" "corner pf" "corners" "cost" [125-129] "coupling" "current" "damp" "damper" "damper gland" [130] "damper gland check"
130 Appendices
Appendix B
Confusion Metrix for SVM Text Classifier using Different Text Features for
Case Study 1 (Coal Mills)
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡1)
Actual Failure Non-Failure
Prediction Failure 970 38
Non-Failure 32 847
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡5)
Actual Failure Non-Failure
Prediction Failure 978 36
Non-Failure 24 849
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡10)
Actual Failure Non-Failure
Prediction Failure 977 36
Non-Failure 25 849
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1) Actual Failure Non-Failure
Prediction Failure 968 39
Non-Failure 34 846
Appendices 131
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5)
Actual Failure Non-Failure
Prediction Failure 978 38 Non-
Failure 24 847
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10) Actual Failure Non-Failure
Prediction Failure 977 37 Non-
Failure 25 848
Chi-square feature using keyword dictionary (𝜒𝜒21608)
Actual Failure Non-Failure
Prediction Failure 999 45 Non-
Failure 18 825
Chi-square feature using keyword dictionary (𝜒𝜒21000)
Actual Failure Non-Failure
Prediction Failure 998 47 Non-
Failure 19 823
Chi-square feature using keyword dictionary (𝜒𝜒2500)
Actual Failure Non-Failure
Prediction Failure 998 83 Non-
Failure 19 787
132 Appendices
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1608) Actual Failure Non-Failure
Prediction Failure 1000 45
Non-Failure 17 825
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1000) Actual Failure Non-Failure
Prediction Failure 998 47
Non-Failure 19 823
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁500) Actual Failure Non-Failure
Prediction Failure 998 83
Non-Failure 19 787
Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2) Actual Failure Non-Failure
Prediction Failure 1007 48
Non-Failure 10 822
Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁3) Actual Failure Non-Failure
Prediction Failure 1005 42
Non-Failure 12 828
Appendices 133
Uni-Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2) Actual Failure Non-Failure
Prediction Failure 1005 42 Non-
Failure 12 828
Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2,3) Actual Failure Non-Failure
Prediction Failure 1003 67 Non-
Failure 14 803
Uni-Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2,3) Actual Failure Non-Failure
Prediction Failure 1005 42 Non-
Failure 12 828
134 Appendices
Confusion Metrix for NB Text Classifier using Different Text Features for Case
Study 1 (Coal Mills)
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡1)
Actual Failure Non-Failure
Prediction Failure 920 63
Non-Failure 82 822
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡5)
Actual Failure Non-Failure
Prediction Failure 944 78
Non-Failure 58 807
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡10)
Actual Failure Non-Failure
Prediction Failure 951 94
Non-Failure 51 791
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1) Actual Failure Non-Failure
Prediction Failure 920 63
Non-Failure 82 822
Appendices 135
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5)
Actual Failure Non-Failure
Prediction Failure 944 78 Non-
Failure 58 807
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10) Actual Failure Non-Failure
Prediction Failure 951 94 Non-
Failure 51 791
Chi-square feature using keyword dictionary (𝜒𝜒21608)
Actual Failure Non-Failure
Prediction Failure 920 63 Non-
Failure 82 822
Chi-square feature using keyword dictionary (𝜒𝜒21000)
Actual Failure Non-Failure
Prediction Failure 918 62 Non-
Failure 84 823
Chi-square feature using keyword dictionary (𝜒𝜒2500)
Actual Failure Non-Failure
Prediction Failure 944 82 Non-
Failure 58 803
136 Appendices
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1608) Actual Failure Non-Failure
Prediction Failure 920 63
Non-Failure 82 822
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1000) Actual Failure Non-Failure
Prediction Failure 917 62
Non-Failure 85 823
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁500) Actual Failure Non-Failure
Prediction Failure 941 81
Non-Failure 61 804
Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2) Actual Failure Non-Failure
Prediction Failure 966 95
Non-Failure 36 790
Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁3) Actual Failure Non-Failure
Prediction Failure 992 229
Non-Failure 10 656
Appendices 137
Uni-Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2) Actual Failure Non-Failure
Prediction Failure 964 85 Non-
Failure 38 800
Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2,3) Actual Failure Non-Failure
Prediction Failure 986 112 Non-
Failure 16 773
Uni-Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2,3) Actual Failure Non-Failure
Prediction Failure 973 94 Non-
Failure 29 791
138 Appendices
Appendix C
Keyword Dictionaries Constructed from Different Text Features for Case Study
2 (Boilers)
A portion of keyword dictionary (𝑁𝑁𝑁𝑁2) using Bi-Gram features Record No. Keywords [2000-2005] "makeup valv" "man coolers" "man start" "manifold leak" "manifold
submerged" [2006-2010] "manual door" "manufacture cover" "manufacture install" "manufacture rol"
"manufacture spare" [2011-2015] "mark drill" "mc spare" "mech elec" "mech seal" "mech
service" [2016-2020] "mecseal leak" "mesh floor" "meter clean" "min flow" "minimum
flow" [2021-2025] "minor overhaul" "missing refractory" "mixing chamb" "mm airheater" "mm
long"
A portion of keyword dictionary (𝑁𝑁𝑁𝑁3) using Tri-Gram features Record No. Keywords [3000-3003] "spray nozzles rod" "spray valves limit" "sprays main steam" [3004-3006] "sprays pipes blr" "spreader air duct" "sprhtr control valve" [3007-3009] "sprhtr loops outsourc" "sprhtr sfty lh" "sprockets fabricate sprocket" [3010-3012] "sprockets reclaim bag" "sprockets slat con" "square hole steel" [3013-3015] "ss pipe ton" "ss tube nitrogen" "stack drain sid"
Appendices 139
Appendix D
Confusion Metrix for SVM Text Classifier using Different Text Features for
Case Study 2 (Boilers)
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡1)
Actual Failure Non-Failure
Prediction Failure 21 10 Non-
Failure 147 372
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡2)
Actual Failure Non-Failure
Prediction Failure 29 16 Non-
Failure 139 366
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡5) Actual Failure Non-Failure
Prediction Failure 33 22 Non-
Failure 135 360
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡10) Actual Failure Non-Failure
Prediction Failure 23 18 Non-
Failure 145 364
140 Appendices
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1) Actual Failure Non-Failure
Prediction Failure 23 4
Non-Failure 145 378
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2)
Actual Failure Non-Failure
Prediction Failure 24 11
Non-Failure 144 371
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5)
Actual Failure Non-Failure
Prediction Failure 26 13
Non-Failure 142 369
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10) Actual Failure Non-Failure
Prediction Failure 21 13
Non-Failure 147 369
Chi-square feature using keyword dictionary (𝜒𝜒21297)
Actual Failure Non-Failure
Prediction Failure 17 15
Non-Failure 117 321
Appendices 141
Chi-square feature using keyword dictionary (𝜒𝜒21000)
Actual Failure Non-Failure
Prediction Failure 24 21 Non-
Failure 110 315
Chi-square feature using keyword dictionary (𝜒𝜒2700) Actual Failure Non-Failure
Prediction Failure 31 24 Non-
Failure 103 312
Chi-square feature using keyword dictionary (𝜒𝜒2500) Actual Failure Non-Failure
Prediction Failure 36 24 Non-
Failure 98 312
Chi-square feature using keyword dictionary (𝜒𝜒2300) Actual Failure Non-Failure
Prediction Failure 25 13 Non-
Failure 109 323
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1297)
Actual Failure Non-Failure
Prediction Failure 17 15 Non-
Failure 117 321
142 Appendices
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1000) Actual Failure Non-Failure
Prediction Failure 24 21
Non-Failure 110 315
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁700) Actual Failure Non-Failure
Prediction Failure 32 23
Non-Failure 102 313
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁500) Actual Failure Non-Failure
Prediction Failure 36 24
Non-Failure 98 312
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁300) Actual Failure Non-Failure
Prediction Failure 25 13
Non-Failure 109 323
Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2) Actual Failure Non-Failure
Prediction Failure 5 0
Non-Failure 129 336
Appendices 143
Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁3) Actual Failure Non-Failure
Prediction Failure 2 1 Non-
Failure 132 336
Uni-Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2) Actual Failure Non-Failure
Prediction Failure 8 1 Non-
Failure 126 335
Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2,3) Actual Failure Non-Failure
Prediction Failure 4 1 Non-
Failure 130 335
Uni-Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2,3) Actual Failure Non-Failure
Prediction Failure 15 13 Non-
Failure 119 323
144 Appendices
Confusion Metrix for NB Text Classifier using Different Text Features for Case
Study 2 (Boilers)
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡1)
Actual Failure Non-Failure
Prediction Failure 77 73
Non-Failure 91 309
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡2)
Actual Failure Non-Failure
Prediction Failure 72 74
Non-Failure 96 308
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡5) Actual Failure Non-Failure
Prediction Failure 54 54
Non-Failure 114 328
Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡10) Actual Failure Non-Failure
Prediction Failure 39 40
Non-Failure 129 342
Appendices 145
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1)
Actual Failure Non-Failure
Prediction Failure 77 73 Non-
Failure 91 309
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2)
Actual Failure Non-Failure
Prediction Failure 91 73 Non-
Failure 91 309
Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5)
Actual Failure Non-Failure
Prediction Failure 54 54 Non-
Failure 114 328
Term frequency-inverse document frequency feature using keyword dictionary
((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10) Actual Failure Non-Failure
Prediction Failure 39 40 Non-
Failure 129 342
Chi-square feature using keyword dictionary (𝜒𝜒21297)
Actual Failure Non-Failure
Prediction Failure 62 71 Non-
Failure 72 265
146 Appendices
Chi-square feature using keyword dictionary (𝜒𝜒21000)
Actual Failure Non-Failure
Prediction Failure 58 64
Non-Failure 76 272
Chi-square feature using keyword dictionary (𝜒𝜒2700) Actual Failure Non-Failure
Prediction Failure 51 53
Non-Failure 83 283
Chi-square feature using keyword dictionary (𝜒𝜒2500) Actual Failure Non-Failure
Prediction Failure 45 44
Non-Failure 89 292
Chi-square feature using keyword dictionary (𝜒𝜒2300) Actual Failure Non-Failure
Prediction Failure 36 37
Non-Failure 98 299
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1297)
Actual Failure Non-Failure
Prediction Failure 62 71
Non-Failure 72 265
Appendices 147
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1000) Actual Failure Non-Failure
Prediction Failure 56 63 Non-
Failure 78 273
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁700) Actual Failure Non-Failure
Prediction Failure 50 55 Non-
Failure 84 281
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁500) Actual Failure Non-Failure
Prediction Failure 44 44 Non-
Failure 90 292
Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁300) Actual Failure Non-Failure
Prediction Failure 40 36 Non-
Failure 94 300
Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2) Actual Failure Non-Failure
Prediction Failure 26 37 Non-
Failure 108 299
148 Appendices
Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁3) Actual Failure Non-Failure
Prediction Failure 10 14
Non-Failure 124 322
Uni-Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2) Actual Failure Non-Failure
Prediction Failure 73 94
Non-Failure 61 242
Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2,3) Actual Failure Non-Failure
Prediction Failure 29 49
Non-Failure 105 287
Uni-Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2,3) Actual Failure Non-Failure
Prediction Failure 92 125
Non-Failure 42 211
Appendices 149
Appendix E
Confusion Metrix for SVM Text Classifier in Active Learning using Different
Percentages of Labelled WO for Case Study 1 (Coal Mills)
Comparing the predicted WO testing with actual values (Using 5 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 175 3 Non-
Failure 154 253
Comparing the predicted WO testing with actual values (Using 8.2 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 301 14 Non-
Failure 28 242
Comparing the predicted WO testing with actual values (Using 11.11 % of expert
labelled data from WO training) Actual Failure Non-Failure
Prediction Failure 311 23 Non-
Failure 18 233
Comparing the predicted WO testing with actual values (Using 18.57 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 314 43 Non-
Failure 15 213
150 Appendices
Comparing the predicted WO testing with actual values (Using 21.94 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 314 47
Non-Failure 15 209
Comparing the predicted WO testing with actual values (Using 34.77 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 312 13
Non-Failure 17 243
Comparing the predicted WO testing with actual values (Using 39.52 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 312 11
Non-Failure 17 245
Comparing the predicted WO testing with actual values (Using 51.05 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 313 12
Non-Failure 16 244
Comparing the predicted WO testing with actual values (Using 63.96 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 312 9
Non-Failure 17 247
Appendices 151
Comparing the predicted WO testing with actual values (Using 66.79 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 312 9 Non-
Failure 17 247
Comparing the predicted WO testing with actual values (Using 91.47 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 311 9 Non-
Failure 18 247
Comparing the predicted WO testing with actual values (Using 100 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 310 9 Non-
Failure 19 247
152 Appendices
Confusion Metrix for SVM Text Classifier in Active Learning using Different
Percentages of Labelled WO for Case Study 2 (Boilers)
Comparing the predicted WO testing with actual values (Using 5 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 33 65
Non-Failure 99 253
Comparing the predicted WO testing with actual values (Using 18.88 % of expert
labelled data from WO training) Actual Failure Non-Failure
Prediction Failure 22 8
Non-Failure 122 318
Comparing the predicted WO testing with actual values (Using 30.6 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 41 10
Non-Failure 103 316
Comparing the predicted WO testing with actual values (Using 41.37 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 52 16
Non-Failure 92 310
Appendices 153
Comparing the predicted WO testing with actual values (Using 46.31 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 59 21 Non-
Failure 85 305
Comparing the predicted WO testing with actual values (Using 54.08 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 61 17 Non-
Failure 83 309
Comparing the predicted WO testing with actual values (Using 65.29 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 63 18 Non-
Failure 81 308
Comparing the predicted WO testing with actual values (Using 67.85 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 62 20 Non-
Failure 82 306
Comparing the predicted WO testing with actual values (Using 80.79 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 67 17 Non-
Failure 77 309
154 Appendices
Comparing the predicted WO testing with actual values (Using 84.34 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 69 18
Non-Failure 75 308
Comparing the predicted WO testing with actual values (Using 85.29 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 68 18
Non-Failure 76 308
Comparing the predicted WO testing with actual values (Using 100 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 70 18
Non-Failure 74 308
Appendices 155
Appendix F
Confusion Metrix for Active Learning-based Text Classifier Applied over DD
for Case Study 1 (Coal Mills)
Comparing the predicted DD labels with actual ones (Using 10 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1335 162 Non-
Failure 25 32
Comparing the predicted DD labels with actual ones (Using 20 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1324 156 Non-
Failure 36 38
Comparing the predicted DD labels with actual ones (Using 30 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1326 156 Non-
Failure 34 38
Comparing the predicted DD labels with actual ones (Using 40 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1335 156 Non-
Failure 25 38
156 Appendices
Comparing the predicted DD labels with actual ones (Using 50 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1337 179
Non-Failure 23 15
Comparing the predicted DD labels with actual ones (Using 100 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 1356 174
Non-Failure 4 20
Appendices 157
Confusion Matrix for Active Learning-based Text Classifier Applied over DD
for Case Study 2 (Boilers)
Comparing the predicted DD labels with actual ones (Using 30 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 72 240 Non-
Failure 170 565
Comparing the predicted DD labels with actual ones (Using 35 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 75 167 Non-
Failure 167 638
Comparing the predicted DD labels with actual ones (Using 40 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 69 162 Non-
Failure 173 643
Comparing the predicted DD labels with actual ones (Using 45 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 69 30 Non-
Failure 173 775
158 Appendices
Comparing the predicted DD labels with actual ones (Using 50 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 71 34
Non-Failure 171 771
Comparing the predicted DD labels with actual ones (Using 55 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 101 33
Non-Failure 141 772
Comparing the predicted DD labels with actual ones (Using 100 % of expert labelled data from WO training)
Actual Failure Non-Failure
Prediction Failure 112 21
Non-Failure 130 784
Bibliography 159
Bibliography
1. Latino, K., Understanding event data collection: Part 1. Plant Engineering,
2004: p. 31-32.
2. Louit, D.M., R. Pascual, and A.K.S. Jardine, A practical procedure for the
selection of time-to-failure models based on the assessment of trends in
maintenance data. Reliability Engineering & System Safety, 2009. 94(10): p.
1618-1628.
3. Prytz, R., S. Nowaczyk, T. Rognvaldsson, and S. Byttner, Predicting the need
for vehicle compressor repairs using maintenance records and logged vehicle
data. Engineering Applications of Artificial Intelligence, 2015. 41: p. 139-150.
4. Devaney, M., A. Ram, H. Qiu, and J. Lee. Preventing failures by mining
maintenance logs with case-based reasoning. in In 59th Meeting of the Society
for Machinery Failure Prevention Technology (MFPT-59). 2005.
5. Hodkiewicz, M. and M.T.W. Ho, Cleaning historical maintenance work order
data for reliability analysis. Journal of Quality in Maintenance Engineering,
2016. 22(2): p. 146-163.
6. Sipos, R., D. Fradkin, F. Moerchen, and Z. Wang, Log-based predictive
maintenance, in Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining. 2014, ACM: New York,
New York, USA. p. 1867-1876.
7. Edwards, B., M. Zatorsky, and R. Nayak, Clustering and classification of
maintenance logs using text data mining, in Proceedings of the 7th
Australasian Data Mining Conference. 2008, Australian Computer Society,
Inc.: Glenelg, Australia. p. 193-199.
8. Moreira, R.d.P. and C.L.N. Junior. Prognostics of aircraft bleed valves using a
SVM classification algorithm. in Aerospace Conference, 2012 IEEE. 2012.
9. Ruiz, P.P., B.K. Foguem, and B. Grabot, Improving Maintenance Strategies
from Experience Feedback. IFAC Proceedings Volumes, 2013. 46(9): p. 625-
630.
160 Bibliography
10. Waeyenbergh, G. and L. Pintelon, A framework for maintenance concept
development. International Journal of Production Economics, 2002. 77(3): p.
299-313.
11. Duffuaa, S.O., A. Raouf, and J.D. Campbell, Planning and control of
maintenance systems: Modelling & Analysis. Second ed. 2010: Springer.
12. Muchiri, P., L. Pintelon, L. Gelders, and H. Martin, Development of
maintenance function performance measurement framework and indicators.
International Journal of Production Economics, 2011. 131(1): p. 295-302.
13. Manzini, R., A. Regattieri, H. Pham, and E. Ferrari, Maintenance for industrial
systems. 2009: Springer.
14. Wienker, M., K. Henderson, and J. Volkerts, The Computerized Maintenance
Management System An essential Tool for World Class Maintenance.
Symphos 2015 - 3rd International Symposium on Innovation and Technology
in the Phosphate Industry, 2016. 138: p. 413-420.
15. Ahmad, R. and S. Kamaruddin, An overview of time-based and condition-
based maintenance in industrial application. Computers & Industrial
Engineering, 2012. 63(1): p. 135-149.
16. Guillén, A.J., A. Crespo, J.F. Gómez, and M.D. Sanz, A framework for
effective management of condition based maintenance programs in the context
of industrial development of E-Maintenance strategies. Computers in Industry,
2016. 82: p. 170-185.
17. Alrabghi, A. and A. Tiwari, State of the art in simulation-based optimisation
for maintenance systems. Computers & Industrial Engineering, 2015. 82: p.
167-182.
18. Ding, S.H. and S. Kamaruddin, Maintenance policy optimization-literature
review and directions. International Journal of Advanced Manufacturing
Technology, 2015. 76(5-8): p. 1263-1283.
19. Dekker, R., Applications of maintenance optimization models: a review and
analysis. Reliability Engineering & System Safety, 1996. 51(3): p. 229-240.
20. Sharma, A., G.S. Yadava, and S.G. Deshmukh, A literature review and future
perspectives on maintenance optimization. Journal of Quality in Maintenance
Engineering, 2011. 17(1): p. 5-25.
Bibliography 161
21. Horenbeek, A.V., L. Pintelon, and P. Muchiri, Maintenance optimization
models and criteria. International Journal of System Assurance Engineering
and Management, 2011. 1(3): p. 189-200.
22. Alaswad, S. and Y. Xiang, A review on condition-based maintenance
optimization models for stochastically deteriorating system. Reliability
Engineering & System Safety, 2017. 157: p. 54-63.
23. Gabbar, H.A., H. Yamashita, K. Suzuki, and Y. Shimada, Computer-aided
RCM-based plant maintenance management system. Robotics and Computer-
Integrated Manufacturing, 2003. 19(5): p. 449-458.
24. Prajapati, A., J. Bechtel, and S. Ganesan, Condition based maintenance: a
survey. Journal of Quality in Maintenance Engineering, 2012. 18(4): p. 384-
400.
25. Barabadi, A., J. Barabady, and T. Markeset, Maintainability analysis
considering time-dependent and time-independent covariates. Reliability
Engineering & System Safety, 2011. 96(1): p. 210-217.
26. Chen, G. and T.T. Pham, Introduction to Fuzzy Systems. CRC Applied
Mathematics and Nonlinear Science Series. 2005: Taylor & Francis Group.
27. Mccall, J.J., Maintenance Policies for Stochastically Failing Equipment - a
Survey. Management Science, 1965. 11(5): p. 493-524.
28. Perakis, A.N. and B. Inozu, Optimal Maintenance, Repair, and Replacement
for Great-Lakes Marine Diesels. European Journal of Operational Research,
1991. 55(2): p. 165-182.
29. Sherif, Y.S., Reliability-Analysis - Optimal Inspection and Maintenance
Schedules of Failing Systems. Microelectronics and Reliability, 1982. 22(1):
p. 59-115.
30. Kijima, M., Some results for repairable systems with with general repair.
Journal of Applied Probability, 1989. 26: p. 89-102.
31. Doyen, L. and O. Gaudoin, Imperfect repair models with planned preventive
maintenance. 2009.
32. Doyen, L. and O. Gaudoin, Classes of imperfect repair models based on
reduction of failure intensity or virtual age. Reliability Engineering & System
Safety, 2004. 84(1): p. 45-56.
162 Bibliography
33. Shin, I., T.J. Lim, and C.H. Lie, Estimating parameters of intensity function
and maintenance effect for repairable unit. Reliability Engineering & System
Safety, 1996. 54(1): p. 1-10.
34. Ramírez, P.A.P. and I.B. Utne, Decision support for life extension of technical
systems through virtual age modelling. Reliability Engineering & System
Safety, 2013. 115: p. 55-69.
35. Wang, H. and H. Pham, Reliability and optimal maintenance. 2006: Springer
Science & Business Media.
36. Pulcini, G., On the prediction of future failures for a repairable equipment
subject to overhauls. Communications in Statistics-Theory and Methods, 2001.
30(4): p. 691-706.
37. Altun, M. and S.V. Comert, A change-point based reliability prediction model
using field return data. Reliability Engineering & System Safety, 2016. 156: p.
175-184.
38. Guo, H.R.R., H.T. Liao, W.B. Zhao, and A. Mettas, A new Stochastic model
for systems under general repairs. Ieee Transactions on Reliability, 2007.
56(1): p. 40-49.
39. Muhammad, M., A.A. Mokhtar, and H. Hussin. Reliability assessment
framework for repairable system. in Business, Engineering and Industrial
Applications (ISBEIA), 2012 IEEE Symposium on. 2012.
40. Regattieri, A., R. Manzini, and D. Battini, Estimating reliability characteristics
in the presence of censored data: A case study in a light commercial vehicle
manufacturing system. Reliability Engineering & System Safety, 2010. 95(10):
p. 1093-1102.
41. Si, X.S., W.B. Wang, C.H. Hu, and D.H. Zhou, Remaining useful life
estimation - A review on the statistical data driven approaches. European
Journal of Operational Research, 2011. 213(1): p. 1-14.
42. de Jonge, B., R. Teunter, and T. Tinga, The influence of practical factors on
the benefits of condition-based maintenance over time-based maintenance.
Reliability Engineering & System Safety, 2017. 158: p. 21-30.
43. Zhang, Q., C. Hua, and G.H. Xu, A mixture Weibull proportional hazard model
for mechanical system failure prediction utilising lifetime and monitoring data.
Mechanical Systems and Signal Processing, 2014. 43(1-2): p. 103-112.
Bibliography 163
44. Raouf, A., S. Duffuaa, M. Ben-Daya, A.H.C. Tsang, W.K. Yeung, A.K.S.
Jardine, and B.P.K. Leung, Data management for CBM optimization. Journal
of Quality in Maintenance Engineering, 2006. 12(1): p. 37-51.
45. Bastos, P., I. Lopes, and L.C.M. Pires. Application of data mining in a
maintenance system for failure prediction. in Safety, Reliability and Risk
Analysis: Beyond the Horizon: 22nd European Safety and Reliability. 2014.
Taylor & Francis Group.
46. Márquez, A.C., The maintenance management framework: models and
methods for complex systems maintenance. 2007: Springer Science &
Business Media.
47. Jardine, A.K.S. and A.H.C. Tsang, Maintenance, replacement, and reliability:
theory and applications. 2013: CRC press.
48. Tian, Z.G. and H.T. Liao, Condition based maintenance optimization for multi-
component systems using proportional hazards model. Reliability Engineering
& System Safety, 2011. 96(5): p. 581-589.
49. Jardine, A., V. Makis, D. Banjevic, D. Braticevic, and M. Ennis, A decision
optimization model for condition-based maintenance. Journal of Quality in
Maintenance Engineering, 1998. 4(2): p. 115-121.
50. Lin, J., J. Pulido, and M. Asplund, Reliability analysis for preventive
maintenance based on classical and Bayesian semi-parametric degradation
approaches using locomotive wheel-sets as a case study. Reliability
Engineering & System Safety, 2015. 134: p. 143-156.
51. Jardine, A.K.S., D.M. Lin, and D. Banjevic, A review on machinery
diagnostics and prognostics implementing condition-based maintenance.
Mechanical Systems and Signal Processing, 2006. 20(7): p. 1483-1510.
52. Kelly, A., Maintenance Systems and Documentation. 2006, Burlington, MA,
USA: Elsevier.
53. Hong, Y., Reliability prediction based on complicated data and dynamic data.
2009, Iowa State University. p. 1-130.
54. Meeker, W.Q. and Y. Hong, Reliability Meets Big Data: Opportunities and
Challenges. Quality Engineering, 2013. 26(1): p. 102-116.
55. Skoogh, A., T. Perera, and B. Johansson, Input data management in simulation
- Industrial practices and future trends. Simulation Modelling Practice and
Theory, 2012. 29: p. 181-192.
164 Bibliography
56. Moore, W.J. and A.G. Starr, An intelligent maintenance system for continuous
cost-based prioritisation of maintenance activities. Computers in Industry,
2006. 57(6): p. 595-606.
57. Madhikermi, M., S. Kubler, J. Robert, A. Buda, and K. Främling, Data quality
assessment of maintenance reporting procedures. Expert Systems with
Applications, 2016. 63: p. 145-164.
58. Alkali, B.M., T. Bedford, J. Quigley, and J. Gaw, Failure and maintenance data
extraction from power plant maintenance management databases. Journal of
Statistical Planning and Inference, 2009. 139(5): p. 1766-1776.
59. Ittoo, A., L.M. Nguyen, and A. van den Bosch, Text analytics in industry:
Challenges, desiderata and trends. Computers in Industry, 2016. 78: p. 96-107.
60. Hogenboom, F., F. Frasincar, U. Kaymak, F. de Jong, and E. Caron, A Survey
of event extraction methods from text for decision support systems. Decision
Support Systems, 2016. 85: p. 12-22.
61. Wu, X.D., X.Q. Zhu, G.Q. Wu, and W. Ding, Data Mining with Big Data. Ieee
Transactions on Knowledge and Data Engineering, 2014. 26(1): p. 97-107.
62. Gurbuz, F., L. Ozbakir, and H. Yapici, Data mining and preprocessing
application on component reports of an airline company in Turkey. Expert
Systems with Applications, 2011. 38(6): p. 6618-6626.
63. Jiawei, H., Data mining: Concepts and techniques. 2001, University Of Simon
Fraser.
64. Fayyad, U., G.P. Shapiro, and P. Smyth, From Data Mining to Knowledge
Discovery in Databases, in Artificial Intelligence. 1996, American Association
for Artificial Intelligence. p. 37-54.
65. Alkharboush, N.A., A Data Mining Approach to Improve the Automated
Quality of Data, in School of Electrical Engineering and Computer Science.
2013, Queensland University of Technology: Brisbane. p. 1-193.
66. Kotu, V. and B. Deshpande, Predictive analytics and data mining: concepts and
practice with rapidminer. 2014, Burligton: Elsevier Science.
67. Sharma, S., K.M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated
Knowledge Discovery and Data Mining process model. Expert Systems with
Applications, 2012. 39(13): p. 11335-11348.
Bibliography 165
68. Moro, S., P. Cortez, and P. Rita, Business intelligence in banking: A literature
analysis from 2002 to 2013 using text mining and latent Dirichlet allocation.
Expert Systems with Applications, 2015. 42(3): p. 1314-1324.
69. Low, W.L., M.L. Lee, and T.W. Ling, A knowledge-based approach for
duplicate elimination in data cleaning. Information Systems, 2001. 26(8): p.
585-606.
70. Ur-Rahman, N. and J.A. Harding, Textual data mining for industrial
knowledge management and text classification: A business oriented approach.
Expert Systems with Applications, 2012. 39(5): p. 4729-4739.
71. Munkova, D., M. Munk, and M. Vozar, Data Pre-Processing Evaluation for
Text Mining: Transaction/Sequence Model. 2013 International Conference on
Computational Science, 2013. 18: p. 1198-1207.
72. Sriurai, W., Improving text categorization by using a topic model. Advanced
Computing, 2011. 2(6): p. 21.
73. Borrajo, L., A.S. Vieira, and E.L. Iglesias, TCBR-HMM: An HMM-based text
classifier with a CBR system. Applied Soft Computing, 2015. 26: p. 463-473.
74. Mathew, T. Text categorization using N-grams and Hidden-Markov-Models.
2006.
75. Yang, J.M., Y.N. Liu, Z. Liu, X.D. Zhu, and X.X. Zhang, A new feature
selection algorithm based on binomial hypothesis testing for spam filtering.
Knowledge-Based Systems, 2011. 24(6): p. 904-914.
76. Yang, Y. and J.O. Pedersen. A comparative study on feature selection in text
categorization. in ICML. 1997.
77. Rogati, M. and Y. Yang. High-performing feature selection for text
classification. in Proceedings of the eleventh international conference on
Information and knowledge management. 2002. ACM.
78. Zhang, L., J. Zhu, and T. Yao, An evaluation of statistical spam filtering
techniques. ACM Transactions on Asian Language Information Processing
(TALIP), 2004. 3(4): p. 243-269.
79. Liu, Y.N., J.-W. Bi, and Z.-P. Fan, Multi-class sentiment classification: The
experimental comparisons of feature selection and machine learning
algorithms. Expert Systems with Applications, 2017. 80: p. 323-339.
166 Bibliography
80. Wang, W.B., F. Zhao, and R. Peng, A preventive maintenance model with a
two-level inspection policy based on a three-stage failure process. Reliability
Engineering & System Safety, 2014. 121: p. 207-220.
81. Chen, J., H. Huang, S. Tian, and Y. Qu, Feature selection for text classification
with Naïve Bayes. Expert Systems with Applications, 2009. 36(3): p. 5432-
5435.
82. Chen, K., Z. Zhang, J. Long, and H. Zhang, Turning from TF-IDF to TF-IGM
for term weighting in text classification. Expert Systems with Applications,
2016. 66: p. 245-260.
83. Trstenjak, B., S. Mikac, and D. Donko, KNN with TF-IDF Based Framework
for Text Categorization. 24th Daaam International Symposium on Intelligent
Manufacturing and Automation, 2013, 2014. 69: p. 1356-1364.
84. Escalante, H.J., M.A. Garcia-Limon, A. Morales-Reyes, M. Graff, M. Montes-
y-Gomez, E.F. Morales, and J. Martinez-Carranza, Term-weighting learning
via genetic programming for text classification. Knowledge-Based Systems,
2015. 83: p. 176-189.
85. Wang, D.Q., H. Zhang, R. Liu, W.F. Lv, and D.T. Wang, t-Test feature
selection approach based on term frequency for text categorization. Pattern
Recognition Letters, 2014. 45: p. 1-10.
86. Tripathy, A., A. Agrawal, and S.K. Rath, Classification of sentiment reviews
using n-gram machine learning approach. Expert Systems with Applications,
2016. 57: p. 117-126.
87. Ogada, K., W. Mwangi, and W. Cheruiyot, N-gram based text categorization
method for improved data mining. Journal of Information Engineering and
Applications, 2015. 5(8): p. 35-43.
88. Pang, B. and L. Lee. A sentimental education: Sentiment analysis using
subjectivity summarization based on minimum cuts. in Proceedings of the
42nd annual meeting on Association for Computational Linguistics. 2004.
Association for Computational Linguistics.
89. Saleh, M.R., M.T. Martin-Valdivia, A. Montejo-Raez, and L.A. Urena-Lopez,
Experiments with SVM to classify opinions in different domains. Expert
Systems with Applications, 2011. 38(12): p. 14799-14804.
90. Lantz, B., Machine Learning with R. Vol. 1. 2013, GB: Packt Publishing.
Bibliography 167
91. Vo, D.T. and C.Y. Ock, Learning to classify short text from scientific
documents using topic models with various types of knowledge. Expert
Systems with Applications, 2015. 42(3): p. 1684-1698.
92. Fragos, K., P. Belsis, and C. Skourlas, Combining Probabilistic Classifiers for
Text Classification. 3rd International Conference on Integrated Information
(Ic-Ininfo), 2014. 147: p. 307-312.
93. Cortes, C. and V. Vapnik, Support-Vector Networks. Machine Learning, 1995.
20(3): p. 273-297.
94. Joachims, T. Text categorization with support vector machines: Learning with
many relevant features. in European conference on machine learning. 1998.
Springer.
95. Basu, A., C. Walters, and M. Shepherd. Support vector machines for text
categorization. in System Sciences, 2003. Proceedings of the 36th Annual
Hawaii International Conference on. 2003. IEEE.
96. Colas, F. and P. Brazdil. Comparison of SVM and some older classification
algorithms in text classification tasks. in IFIP International Conference on
Artificial Intelligence in Theory and Practice. 2006. Springer.
97. Ozgur, L., T. Gungor, and F. Gurgen, Adaptive anti-spam filtering for
agglutinative languages: a special case for Turkish. Pattern Recognition
Letters, 2004. 25(16): p. 1819-1831.
98. Yu, B. and Z.B. Xu, A comparative study for content-based dynamic spam
classification using four machine learning algorithms. Knowledge-Based
Systems, 2008. 21(4): p. 355-362.
99. Lai, C.C., An empirical study of three machine learning methods for spam
filtering. Knowledge-Based Systems, 2007. 20(3): p. 249-254.
100. Webb, S., S. Chitti, and C. Pu. An experimental evaluation of spam filter
performance and robustness against attack. in Collaborative Computing:
Networking, Applications and Worksharing, 2005 International Conference
on. 2005. IEEE.
101. Androutsopoulos, I., G. Paliouras, and E. Michelakis, Learning to filter
unsolicited commercial e-mail. 2004.
102. Yadav, S.K., Sentiment analysis and classification: a survey. International
Journal of Advance Research in Computer Science and Management Studies,
2015. 3(3): p. 113-121.
168 Bibliography
103. Agarwal, B. and N. Mittal, Prominent feature extraction for review analysis:
an empirical study. Journal of Experimental & Theoretical Artificial
Intelligence, 2016. 28(3): p. 485-498.
104. Omar, N., M. Albared, T. Al-Moslmi, and A. Al-Shabi. A comparative study
of feature selection and machine learning algorithms for arabic sentiment
classification. in Asia Information Retrieval Symposium. 2014. Springer.
105. Sharma, A. and S. Dey. A comparative study of feature selection and machine
learning techniques for sentiment analysis. in Proceedings of the 2012 ACM
Research in Applied Computation Symposium. 2012. ACM.
106. Tan, S. and J. Zhang, An empirical study of sentiment analysis for chinese
documents. Expert Systems with applications, 2008. 34(4): p. 2622-2629.
107. Liu, K. and N. El-Gohary, Ontology-based semi-supervised conditional
random fields for automated information extraction from bridge inspection
reports. Automation in Construction, 2017.
108. Lavergne, T., J.M. Crego, A. Allauzen, and F. Yvon. From n-gram-based to
crf-based translation models. in Proceedings of the Sixth Workshop on
Statistical Machine Translation. 2011. Association for Computational
Linguistics.
109. Druck, G., B. Settles, and A. McCallum. Active learning by labeling features.
in Proceedings of the 2009 Conference on Empirical Methods in Natural
Language Processing: Volume 1-Volume 1. 2009. Association for
Computational Linguistics.
110. Gupta, R., Conditional random fields. Unpublished report, IIT Bombay, 2006.
111. Khan, A., B. Baharudin, L.H. Lee, and K. Khan, A review of machine learning
algorithms for text-documents classification. Journal of advances in
information technology, 2010. 1(1): p. 4-20.
112. Mitra, V., C.J. Wang, and S. Banerjee, Text classification: A least square
support vector machine approach. Applied Soft Computing, 2007. 7(3): p. 908-
914.
113. Drucker, H., D. Wu, and V.N. Vapnik, Support vector machines for spam
categorization. IEEE Trans Neural Netw, 1999. 10(5): p. 1048-54.
114. Rouhani, M. and D.S. Javan, Two fast and accurate heuristic RBF learning
rules for data classification. Neural Netw, 2016. 75: p. 150-61.
Bibliography 169
115. Maulik, U. and D. Chakraborty, A self-trained ensemble with semisupervised
SVM: An application to pixel classification of remote sensing imagery. Pattern
Recognition, 2011. 44(3): p. 615-623.
116. Zhang, Y.H., J.H. Wen, X.B. Wang, and Z. Jiang, Semi-supervised learning
combining co-training with active learning. Expert Systems with Applications,
2014. 41(5): p. 2372-2378.
117. Hu, R., B. Mac Namee, and S.J. Delany, Active learning for text classification
with reusability. Expert Systems with Applications, 2016. 45: p. 438-449.
118. Silva, C. and B. Ribeiro, On text-based mining with active learning and
background knowledge using SVM. Soft Computing, 2007. 11(6): p. 519-530.
119. Leng, Y., X.Y. Xu, and G.H. Qi, Combining active learning and semi-
supervised learning to construct SVM classifier. Knowledge-Based Systems,
2013. 44: p. 121-131.
120. Wang, X.B., J.H. Wen, S. Alam, Z. Jiang, and Y.B. Wu, Semi-supervised
learning combining transductive support vector machine with active learning.
Neurocomputing, 2016. 173: p. 1288-1298.
121. Settles, B., Active learning literature survey. University of Wisconsin,
Madison, 2010. 52(55-66): p. 11.
122. Tong, S., Active learning: theory and applications. 2001, Citeseer.
123. Hajmohammadi, M.S., R. Ibrahim, A. Selamat, and H. Fujita, Combination of
active learning and self-training for cross-lingual sentiment classification with
density analysis of unlabelled samples. Information Sciences, 2015. 317: p. 67-
77.
124. Saito, P.T.M., P.J. de Rezende, A.X. Falcao, C.T.N. Suzuki, and J.F. Gomes,
An active learning paradigm based on a priori data reduction and organization.
Expert Systems with Applications, 2014. 41(14): p. 6086-6097.
125. Calma, A., J.M. Leimeister, P. Lukowicz, S. Oeste-Reiß, T. Reitmaier, A.
Schmidt, B. Sick, G. Stumme, and K.A. Zweig. From active learning to
dedicated collaborative interactive learning. in ARCS 2016; 29th International
Conference on Architecture of Computing Systems; Proceedings of. 2016.
VDE.
126. Cholette, M.E., P. Borghesani, E. Di Gialleonardo, and F. Braghin, Using
support vector machines for the computationally efficient identification of
170 Bibliography
acceptable design parameters in computer-aided engineering applications.
Expert Systems with Applications, 2017. 81: p. 39-52.
127. Vlachos, A., A stopping criterion for active learning. Computer Speech and
Language, 2008. 22(3): p. 295-312.
128. Kremer, J., K.S. Pedersen, and C. Igel, Active learning with support vector
machines. Wiley Interdisciplinary Reviews-Data Mining and Knowledge
Discovery, 2014. 4(4): p. 313-326.
129. Olsson, F., A literature survey of active machine learning in the context of
natural language processing. 2009.
130. Lewis, D.D. and W.A. Gale. A sequential algorithm for training text classifiers.
in Proceedings of the 17th annual international ACM SIGIR conference on
Research and development in information retrieval. 1994. Springer-Verlag
New York, Inc.
131. Fu, Y.F., X.Q. Zhu, and B. Li, A survey on instance selection for active
learning. Knowledge and Information Systems, 2013. 35(2): p. 249-283.
132. Angluin, D., Queries and concept learning. Machine Learning, 1988. 2(4): p.
319-342.
133. Moon, S., C. McCarter, and Y.-H. Kuo, Active learning with partially featured
data, in Proceedings of the 23rd International Conference on World Wide Web.
2014, ACM: Seoul, Korea. p. 1143-1148.
134. Li, L., X. Jin, S.J. Pan, and J.T. Sun. Multi-domain active learning for text
classification. in Proceedings of the 18th ACM SIGKDD international
conference on Knowledge discovery and data mining. 2012. ACM.
135. Novak, B., D. Mladenič, and M. Grobelnik, Text classification with active
learning, in From Data and Information Analysis to Knowledge Engineering.
2006, Springer. p. 398-405.
136. Brinker, K., Active learning with kernel machines. 2004, Citeseer.
137. Goudjil, M., M. Koudil, M. Bedda, and N. Ghoggali, A novel active learning
method using SVM for text classification. International Journal of Automation
and Computing: p. 1-9.
138. Zhu, X., Semi-supervised learning literature survey. 2005.
139. Pavlinek, M. and V. Podgorelec, Text classification method based on self-
training and LDA topic models. Expert Systems with Applications, 2017. 80:
p. 83-93.
Bibliography 171
140. Hodkiewicz, M., P.Kelly, J.Sikorska, and L.Gouws. A framework to assess
data quality for reliability variables. in 1st World Congress of Engineering
Asset Management. 2006. Gold Coast, Queensland Asutralia.
141. Agrawal, V., B.K. Panigrahi, and P.M.V. Subbarao, Review of control and
fault diagnosis methods applied to coal mills. Journal of Process Control,
2015. 32: p. 138-153.
142. Chauhan, M.K., Varun, S. Chaudhary, S. Kumar, and Samar, Life cycle
assessment of sugar industry: A review. Renewable & Sustainable Energy
Reviews, 2011. 15(7): p. 3445-3453.