Kazi Arif-Uz-Zaman - QUT · Kazi Arif-Uz-Zaman . Master in Engineering (Research) Supervisors: Principal: Professor Lin Ma . Associate: Dr. Michael E. Cholette, A/Prof.Yue Xu, Dr

FAILURE AND MAINTENANCE INFORMATION EXTRACTION

METHODOLOGY USING MULTIPLE DATABASES FROM INDUSTRY: A NEW

DATA FUSION APPROACH

Kazi Arif-Uz-Zaman Master in Engineering (Research)

Supervisors:

Principal: Professor Lin Ma

Associate: Dr. Michael E. Cholette, A/Prof.Yue Xu, Dr. Azharul Karim

Submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy

School of Chemistry, Physics and Mechanical Engineering

Faculty of Science and Engineering

Queensland University of Technology

2018

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach i

Keywords

Text mining, work order (WO) analysis, naïve Bayes, support vector machine,

failure time, text classification, active learning, information requirement

specifications, semi-supervised learning, reliability models, maintenance optimization

models.

ii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

Abstract

Maintenance planning, budgeting, and optimisation continue to attract

significant research and practical attention. At the centre of all of these methodologies

are statistical models for the reliability and/or degradation of key assets. Yet, these

statistical models require accurate event times (e.g. failure times) and for many

industrial applications, such information is often scattered in many historical

maintenance databases. Additionally, real world databases have often been set up for

purposes other than statistical modelling and are focused on the process of

maintenance activities (e.g. communicating what needs to be done by the maintenance

crew and when) rather than on detailed cataloguing of downtime causes, degradation,

and failure events. In addition, different aspects of maintenance activities themselves

are often dispersed across different databases. Some databases contain data

descriptions of the work that needs to be conducted and give indications of the priority

of maintenance activities, others may contain detailed information on when the asset

was operating and when it stopped without noting the reason. Thus, the existing data

cannot be interpreted individually, since each database provides an incomplete picture

of the asset performance, condition, and reliability.

This study aims to establish methods for linking relevant data and information

in separate maintenance databases to support reliability and maintenance decision

modelling, in particular, Time to Failure or Failure Time Information. First, the data

requirements for reliability and maintenance optimisation modelling are established

and possible sources of information to satisfy such requirements are investigated. To

link different databases, this thesis proposes an innovative text mining approach. To

obtain such linking, some organisations may use different data fields or may check the

synergy of dates. However, in many historical databases (especially for a long lived

asset), such linking does not exist. Though different databases provide their own sides

of the picture of maintenance, the most commonly available and detailed maintenance

information is often recorded in the free texts of maintenance work descriptions.

Therefore, one may expect that different maintenance databases can be linked by

mining the free text to identify and extract the information necessary for asset

reliability and optimisation modelling.

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach iii

A text mining approach is employed to extract Failure Time using keywords

(existed in free text descriptions of various databases) indicative of the nature and

characteristics of the maintenance events. This study automatically labels the

maintenance data of one database using data fields and links them with another

database through the free texts. The proposed method thus identifies the “failure”

events whose text descriptions are consistent with the definition of failure across

multiple maintenance databases.

An alternative approach to identifying failure times is to use an expert’s

interpretation of the free texts. In this case, the key challenge is the “expense” of

labelling; the expert must assess each text description individually and thus labelling

all of the data is infeasible. To mitigate this, an active learning approach is proposed

to construct a text classifier from a limited number of expert labelled samples.

The applicability of the methodologies is demonstrated on maintenance data sets

from electricity and sugar processing companies. The performance of the text

classifiers is assessed in terms of their accuracy, precision and recall measures.

Analysis of the text of the identified failure events seems to confirm the accurate

identification of failures. The results are expected to be immediately useful in

improving the estimation of failure times (and thus the reliability models) for real

world assets. Furthermore, the findings from the active learning based approach

demonstrated on industrial maintenance data reveal that failure time information can

be identified, allowing minimum maintenance data to be interpreted by the expert.

Active learning can decrease the number of labelled samples by approximately 50%,

while achieving the same classification accuracy. The outcomes of this study can be

used to develop statistical models of failure times from older historical databases and

maintenance, where the only consistently available data is a free text description.

iv Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

Table of Contents

Contents

Keywords .................................................................................................................................. i

Abstract .................................................................................................................................... ii

Table of Contents .................................................................................................................... iv

List of Figures ....................................................................................................................... viii

List of Tables ............................................................................................................................ x

List of Abbreviations ............................................................................................................. xiii

Statement of Original Authorship ......................................................................................... xvi

Acknowledgements .............................................................................................................. xvii

Chapter 1: Introduction ...................................................................................... 1

1.1 Background and Motivation ........................................................................................... 1

1.2 Research Questions and Objectives ............................................................................... 3

1.3 Research Contribution, Innovation and Significance ..................................................... 5

1.4 Publications .................................................................................................................... 6

1.5 Thesis Organisation ........................................................................................................ 6

Chapter 2: Literature Review ............................................................................. 9

2.1 Overview of the Maintenance Process ........................................................................... 9

2.1.1 Maintenance Policies and Strategies .................................................................. 11

2.2 Maintenance Optimisation ........................................................................................... 12

2.3 Failure Time Models .................................................................................................... 16

2.3.1 Virtual Age (VA) Model .................................................................................... 16

2.4 Degradation Models ..................................................................................................... 18

2.5 Maintenance objectives and costs ................................................................................ 21

2.6 Advantages and Disadvantages of the Models ............................................................. 23

2.7 Typically Available Maintenance Databases in Industry ............................................. 24

2.8 Knowledge Discovery .................................................................................................. 26

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach v

2.9 Text Mining ..................................................................................................................28

2.10 Text Cleaning and Feature Extraction ..........................................................................29

2.10.1 Bag of Words ......................................................................................................31

2.10.2 Term Frequency (TF)-Inverse Document Frequency (IDF) ...............................32

2.10.3 Chi-square (CS) Statistic ....................................................................................33

2.10.4 Information Gain (IG) ........................................................................................34

2.10.5 Language Model .................................................................................................34

2.11 Text Classification Algorithms .....................................................................................36

2.11.1 Naïve Bayes ........................................................................................................39

2.11.2 Maximum Entropy .............................................................................................39

2.11.3 Conditional Random Fields ................................................................................40

2.11.4 K-Nearest Neighbour .........................................................................................40

2.11.5 Support Vector Machine.....................................................................................41

2.12 Performance Evaluation ................................................................................................43

2.13 Supervised Machine Learning ......................................................................................44

2.14 Semi-Supervised Machine Learning .............................................................................47

2.14.1 Active Learning ..................................................................................................47

2.14.2 Semi-Supervised Self Training ...........................................................................52

2.15 Summary and Research Gap .........................................................................................53

Chapter 3: Information Requirement Specifications for Reliability and

Maintenance Optimisation Models ......................................................................... 57

3.1 Are Current Maintenance databases Sufficient for Maintenance Optimisation? ..........57

3.1.1 Identifying Failure and Planned Maintenance Times .........................................59

3.2 Requirement for Information Extraction Methodology ................................................63

Chapter 4: Failure Time Extraction Methodology Using Text Mining ........ 67

4.1 Motivation ....................................................................................................................68

4.2 Methodology .................................................................................................................68

4.2.1 Definition of Failure ...........................................................................................69

4.2.2 Database A Labelling .........................................................................................70

4.2.3 Features Extraction and Construction of Keyword Dictionary ..........................72

4.2.4 Classifier Construction and Failure Time Extraction .........................................74

4.3 Validation of the Methodology .....................................................................................75

vi Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

4.4 Application of the Methodology .................................................................................. 76

4.5 Summary ...................................................................................................................... 78

Chapter 5: Case Studies on Failure Time Extraction ..................................... 79

5.1 Case Study 1: Coal Fired Power Generation Company ............................................... 79

5.1.1 Overview of a Coal Mill .................................................................................... 79

5.1.2 Data Description and Text Cleaning .................................................................. 80

5.1.3 Work Order Labelling and Feature Extraction .................................................. 83

5.1.4 Training and Testing Text Classifiers ................................................................ 84

5.1.5 Comparison between Failure and Non-Failure Work Orders ............................ 86

5.1.6 Failure Time Extraction ..................................................................................... 87

5.1.7 Validation of the Text Classifier ........................................................................ 88

5.1.8 Application of the Methodology ........................................................................ 88

5.1.9 Comparison between Failure and Non-Failure DD using Text

Descriptions ....................................................................................................... 89

5.1.10 Cumulative Number of Failures before and after Text Mining ......................... 90

5.2 Case Study 2: Boilers in Sugar Processing Industry .................................................... 91

5.2.1 Overview of a Boiler System ............................................................................. 91

5.2.2 Data Description and Text Cleaning .................................................................. 92

5.2.3 Work Order Labelling and Feature Extraction .................................................. 93

5.2.4 Training and Testing Text Classifiers ................................................................ 94

5.2.5 Comparison between Failure and Non-Failure Work Orders ............................ 96

5.2.6 Failure Time Extraction ..................................................................................... 97

5.2.7 Validation of the Text Classifier ........................................................................ 97

5.2.8 Application of the Methodology ........................................................................ 99

5.2.9 Comparison between Failure and Non-Failure DD using Work

Descriptions ....................................................................................................... 99

5.2.10 Cumulative Number of Failures before and after Text Mining ....................... 100

5.3 Summary and Discussion ........................................................................................... 101

Chapter 6: Advanced Information Extraction Methodology Using Text

Mining and Active Learning ................................................................................. 103

6.1 Motivation .................................................................................................................. 104

6.2 Methodology .............................................................................................................. 105

6.2.1 Text Cleaning and Initial Training Data Formulation ..................................... 106

6.2.2 Active Learning via Uncertainty Sampling ..................................................... 107

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach vii

6.3 Case Studies ................................................................................................................109

6.3.1 Classifier Formulation and Benchmark Algorithm ..........................................111

6.3.2 Accuracy of the Text Classifier ........................................................................112

6.3.3 Validation of the Classifier ...............................................................................116

6.3.4 Failure Time Identification Using DD .............................................................117

6.4 Benefits of Including Expert Labelling ......................................................................118

6.5 Summary .....................................................................................................................121

Chapter 7: Conclusion and Future Research Directions ............................. 123

7.1 Conclusion ..................................................................................................................123

7.2 Future Research ..........................................................................................................127

Appendices .............................................................................................................. 129

Bibliography ........................................................................................................... 159

viii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

List of Figures

Figure 1-1. Overview of the research questions ........................................................... 4

Figure 2-1. Production and maintenance process [11] ............................................... 10

Figure 2-2. Maintenance management workflow [13] ............................................... 11

Figure 2-3. Evolution of maintenance strategies [14] ................................................ 11

Figure 2-4. Minimal, perfect and imperfect repair [35] ............................................. 17

Figure 2-5. Deterioration model ................................................................................. 19

Figure 2-6. Process of knowledge discovery in databases [63] ................................. 27

Figure 2-7. Data mining tasks .................................................................................... 28

Figure 2-8. Raw text data with causes of errors and anomalies ................................. 29

Figure 2-9. Features commonly used for text classification ...................................... 31

Figure 2-10. Commonly used classification algorithms for text classification .......... 38

Figure 2-11. A framework for supervised machine learning text classification ........ 45

Figure 2-12. General schema for passive and active learning [121] .......................... 48

Figure 2-13. Three main active learning query selection strategies [120] ................. 49

Figure 2-14. Uncertainty-based active learning that queries “b” ............................... 50

Figure 3-1. Overview of information extraction methodology .................................. 64

Figure 4-1. Methodology to extract failure and non-failure maintenance times ........ 69

Figure 4-2. Data filter and Data Base “A” labelling .................................................. 72

Figure 4-3. Application of the methodology .............................................................. 77

Figure 5-1. Overview of medium-speed (vertical spindle bowl) mill [140] .............. 80

Figure 5-2. Recording of two databases (WO and DD) during maintenance

process (coal mill) ........................................................................................ 81

Figure 5-3. Word cloud representing the keywords appearing in WO ...................... 83

Figure 5-4. Word clouds for (a) failure and (b) non-failure WO’s for coal mills ...... 87

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach ix

Figure 5-5. Cumulative number of failures for Unit X, Mill A (coal mill)................ 91

Figure 5-6. Functional layout of sugar processing system (adapted from [141]) ...... 92

Figure 5-7. Word cloud representing the keywords appearing in WO ...................... 93

Figure 5-8. Word clouds for (a) failure and (b) non-failure WO’s for boilers........... 97

Figure 5-9. Cumulative number of failures for boilers in sugar processing

industry ...................................................................................................... 100

Figure 6-1. Active learning techniques used in the methodology ............................ 105

Figure 6-2. Uncertainty-based Active Learning text classifier ................................ 106

Figure 6-3. The Uncertainty-based AL algorithm.................................................... 109

Figure 6-4. Classification accuracies of each model over the percentage of

labelled data increase ................................................................................. 114

Figure 6-5. Classification accuracies of each mixed-classifier over the

percentages of automatic and expert labelled data .................................... 120

x Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

List of Tables

Table 2-1. Elements of maintenance optimization considering different

maintenance strategies [17, 21-24] .............................................................. 13

Table 2-2. Types of information used for TBM based Failure Time Models ............ 18

Table 2-3. Data processing steps in KDD .................................................................. 27

Table 2-4. Text cleaning technique with different transformation processes [67,

69, 70] .......................................................................................................... 30

Table 2-5. Performance evaluation metrics ................................................................ 44

Table 3-1. Suggested data recording to support reliability modelling (i.e. failure

time modelling) ............................................................................................ 63

Table 4-1. Downtime classifications based on planned and unplanned

maintenance ................................................................................................. 70

Table 4-2. Criteria used for feature extraction and construction of keyword

dictionary ..................................................................................................... 73

Table 5-1. Five randomly selected data from (a) WO and (b) DD during

maintenance process (data is slightly edited to protect proprietary

information) ................................................................................................. 82

Table 5-2. Comparing a few WO data before and after cleaning process ................. 82

Table 5-3. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 1 ......................... 84

Table 5-4. A portion of Mixed-Gram keyword dictionary (𝑁𝑁𝑁𝑁12) for Case

Study 1 ......................................................................................................... 84

Table 5-5. Performances between SVM and NB classifiers using different

keyword dictionaries .................................................................................... 85

Table 5-6. Comparison of Accuracy and F-Measures between SVM and NB

using Different Keyword Dictionaries for Case Study 1 (Coal Mills) ........ 85

Table 5-7. Predicted instances of failure and non-failure downtimes (Coal

mills) ............................................................................................................ 87

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xi

Table 5-8. Cross tabulation of the DD comparing predicted labels with the

estimated ones .............................................................................................. 88

Table 5-9. Cross tabulation of testing work orders comparing predicted labels

to the actual ones .......................................................................................... 89

Table 5-10. Randomly selected predicted downtime data of Unit X Mill A

(coal mill) ..................................................................................................... 90

Table 5-11. Five randomly selected examples from (a) WO and (b) DD (data is

slightly edited to protect proprietary information)....................................... 92

Table 5-12. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 2 ....................... 94

Table 5-13. A portion of keyword dictionary (𝜒𝜒2500) for Case Study 2 .................. 94

Table 5-14. Performances between SVM and NB using different keyword

dictionaries ................................................................................................... 95

Table 5-15. Comparison of Accuracy and F-Measures between SVM and NB

using Different Keyword Dictionaries for Case Study 2 (Boilers) .............. 95

Table 5-16. Predicted instances of failure and non-failure downtimes (boilers) ....... 97

Table 5-17. Estimating actual DD labels using the existing data fields..................... 98

Table 5-18. Cross tabulation of the DD comparing predicted labels with the

estimated ones .............................................................................................. 98

Table 5-19. Cross tabulation of testing work orders comparing predicted labels

with the actual ones ...................................................................................... 99

Table 5-20. Randomly selected predicted downtime data for boilers ...................... 100

Table 6-1. A few randomly selected data entries from WO for (a) coal mills

and (b) boilers ............................................................................................ 110

Table 6-2. Classification accuracies of different models over two case studies ...... 115

Table 6-3. Accuracies of the AL-based classifiers (WO trained classifiers) on

DD .............................................................................................................. 116

Table 6-4. Predicted instances of failure and non-failure downtimes (coal mills) .. 117

Table 6-5. Predicted instances of failure and non-failure downtimes (boilers) ....... 117

xii Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

Table 6-6. Performance of the mixed classifier over percentages of automatic

and expert labelled (uncertainty-based) WO ............................................. 118

Table 6-7. Performance of the mixed classifier over percentages of automatic

and expert labelled (randomly selected) WO ............................................. 119

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xiii

List of Abbreviations

AL: Active learning

ANN: Artificial neural network

ARA: Arithmetic reduction of age

ARI: Arithmetic reduction of intensity

BOW: Bag-of-words

CBM: Condition-based maintenance

CM: Corrective maintenance

CMMS: Computerized maintenance management system

CMS: Condition monitoring system

CS: Chi-square statistic

DCS: Digital control system

DD: Downtime Data

DM: Data mining

ECE: Expected cross-entropy

ERI: Electric research institute

FN: False negative

FP: False positive

FPM: Failure process modelling

FTD: Failure time data

IG: Information gain

KDD: Knowledge discovery in databases

kNN: k-nearest neighbour

LDA: Latent Dirichlet Allocation

xiv Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach

LM: Language model

LR: Logistic regression

MTBF: Mean time between failures

MTTR: Mean time to repair

NB: Naïve Bayes

OEE: Overall equipment efficiency

OR: Operations research

PAR: Proportional age reduction

PHM: Proportional Hazard Model

PI: Proportional intensity

PM: Preventive maintenance

QBC: Query-by-committee

RBF: Radial basis function

RCM: Reliability-centered maintenance

SSL: Semi-supervised learning

SSST: Semi-supervised self-training

SVM: Support vector machine

TBM: Time-based maintenance

TC: Text cleaning

TF: Term frequency

TF-IDF: term frequency-inverse document frequency

TM: Text mining

TN: True negative

TP: True positive

TPM: Total productive maintenance

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xv

TSVM: Transductive support vector machine

WO: Work Order/Notifications

QUT Verified Signature

Failure and Maintenance Information Extraction Methodology Using Multiple Databases from Industry: A new Data Fusion Approach xvii

Acknowledgements

My sincere thanks go to the Australian Government and the Australian people

for having provided me a scholarship, which enabled me to maintain a stable life for

myself and my family and successfully complete the whole PhD program with full

confidence in a great learning environment.

I would like to state great admiration to Professor Lin Ma for his supervision and

support through my doctoral studies at the Queensland University of Technology. I

would like to express my gratitude to Dr. Michael E. Cholette for all the help he

provided with concept, research approach, data analysis and result discussion that

made working in the area of text mining an enjoyable experience. My thanks also go

to Associate Professor Yue Xu for her diligent work and meticulous help to me. In

addition, I appreciate the counsellor skills of Dr. Azharul Karim, who helped me to

keep up my patience and allowed myself to not worry about obstacles I met during my

candidature. Thanks go also to all other people who have helped me during my PhD

journey.

I would like to acknowledge Julie Martyn, a professional editor from the Society

of Editors, Queensland who has provided thesis editing and proof reading service

according to the guidelines laid out in the University-endorsed national policy

guidelines.

I am grateful to my wife, Dr. Asma Akther who has dedicated her love, selfless

help and joined hands with me though so many years of ups and downs.

Chapter 1: Introduction 1

Chapter 1: Introduction

1.1 BACKGROUND AND MOTIVATION

Industrial organisations are continuously seeking new strategies to improve the

performance of their assets. Reliability analysis and maintenance optimisation play an

important role in the efficiency of industrial assets. Proper planning and timely

maintenance have been proved effective in improving asset reliability and

performance. In most cases, information required to develop such models is often

recorded in extensive asset and maintenance data and organised to support accounting

and basic analysis of maintenance decisions. Most asset owners and their maintenance

departments use a computerised maintenance management system (CMMS) to keep

records of all maintenance activities performed on the asset [1]. Ideally, one may wish

to exploit the vast quantities of such historical asset data to develop more sophisticated

analytics, including asset reliability and maintenance optimisation models.

Yet, asset data in CMMSs are often collected in an approach that is inconsistent

with reliability modelling, focusing on maintenance record keeping and accounting

without the specific intent of identifying asset failure times [2, 3]. To see why data

collection practices are often insufficient to identify failure times, consider a typical

example of a definition of asset failure: the inability of an asset to perform the required

function at a given time. To identify a “failure” event in maintenance data, one needs

to know when the asset was down due to unplanned maintenance. Unfortunately,

organisations often use different databases that only possess part of the information

needed to define a failure event. Databases usually contain the record of every

maintenance activity (i.e., repair, check or routine inspection) on an asset. However,

this does not always tell us which maintenance work is raised to fix a failure event or

whether the maintenance was actually planned. Thus, each database is incomplete and

insufficient to identify a “failure” event. One needs to know both when the asset is

down and if this downtime was unplanned. Existing data collection practices are

unable to provide such complete information.

Research literature regarding the theory of reliability and maintenance

optimisation often lacks discussion on how to analyse and identify the required

2 Chapter 1: Introduction

information from maintenance databases. The research of the past years has barely

discussed what information is needed to satisfy the requirements of such models.

Hodkiewicz and Ho [5] used case study approach to analyse data collection practices

and developed a data cleansing method in order to extract required information for

reliability analysis. A detailed identification of required data and collection of required

maintenance data is explored. However, without the actual definition of required

information (often referred to as the requirement specification), such information

cannot be identified accurately nor used in the reliability models. Most importantly,

one needs to define “failure” before trying to identify failure time information (for

instance asset downtime due to unplanned maintenance and whether maintenance was

unscheduled) in maintenance databases. Such necessities imply that maintenance data

need to be collected according to these requirement specifications.

Evidence also suggests that very few methodologies exist to extract the required

information from maintenance databases. Various ontology based learning methods

through case based reasoning (CBR) approaches have been proposed to categorise

industry maintenance logs [4]. However, they have not applied such methods in real

industrial settings. In recent years, some researchers [5, 6] have tried to identify failure

time information from maintenance data using asset replacement information. Major

reasons behind any asset replacement include end of asset life cycle, non-repairable

asset or failure of an asset. Without specifying the “failure” events in the maintenance

data and separating them from replacement information, actual failure time

information cannot be determined. Even a classification model [7] based on a

clustering technique was unable to provide a clear distinction between unscheduled

(i.e., failure) and scheduled maintenance on maintenance text data.

Using condition data, Moreira and Junior [8] proposed a method of performing

prognostics on aircraft components. Flight data and maintenance logs have been used

to classify the training data into healthy and unhealthy states. The degradation index

was finally created from the classification results to prepare a future schedule of

aircraft maintenance. A machine breakdown prediction model was developed using

condition monitoring and maintenance (e.g. corrective and preventative) data.


1.2 RESEARCH QUESTIONS AND OBJECTIVES

No effective solutions have been found so far to accurately identify the historical

failure time information from industrial maintenance data. Thus, the central question

of this thesis is how to bridge the gap between the information required for common

reliability and maintenance optimisation models and the information commonly

available in real world industrial maintenance databases. The advancement of big data

and machine learning tools (e.g. text mining) provide a unique opportunity to develop

intelligent linking to make use of the vast historical databases that many industries

maintain.

Based on the above discussion, the following research questions have been posed

and these are also outlined in Figure 1-1:

1. What information regarding the reliability and maintenance optimisation

models needs to be specified for improving the applicability of these

models?

2. How can typically available maintenance data be analysed and

transformed into the information required (as specified in Research

Question 1) for the models?


Figure 1-1. Overview of the research questions

To address the above questions, firstly, a preliminary literature review has been

conducted to analyse the requirement specifications regarding reliability and

maintenance optimisation models and state-of-art methodologies for text mining and

classification. Different models were investigated to propose a framework for the

requirement specifications. Based on the outcomes of these information models, a

methodology for analysing the text descriptions across multiple databases was

developed. Related to the construction of the text classifier, both supervised and semi-

supervised machine learning methods were tested. Initially, this study labelled one


database automatically using different data fields and linked it to another database

through the free texts. To overcome the shortcomings of automatic labelling due to

unreliable data fields, the proposed method was further modified by including a semi-

supervised text mining method. However, such methods often require expert

judgement to label the data, which is a time consuming process. This study used

expert-labelled maintenance data and tested the feasibility of such an advanced text

mining method to determine the failure time information in maintenance databases.

Performance measures of both the methods were tested in two real world scenarios.

The specific objectives of this research are as follows:

1. To systematically identify the information requirement specifications for

improving the applicability of the reliability and maintenance

optimisation models.

2. To develop a novel method to analyse typically available maintenance

data from industry and transform them into the information required for

the reliability and maintenance optimisation models.

1.3 RESEARCH CONTRIBUTION, INNOVATION AND SIGNIFICANCE

The overall contributions of the proposed research are classified into three areas.

Firstly, the information requirement specifications summarise the information

necessary for different reliability and maintenance optimisation models. Such a new

analysis within a requirement specification framework gives a direction to

maintenance practitioners for recording maintenance data in a standard way. Secondly,

a novel methodology has been developed to identify the failure time information using

data mining techniques. This novel method can extract information from multiple

maintenance databases in both numerical and text formats and is expected to lead to

more reliable failure time identification. Thirdly, the information requirement

specification and extracted failure time information can serve as useful modelling tools

in analysing stochastic repairable maintenance problems.

The outcomes of this research are significant in both industrial applications and

research activities. Firstly, the identification of information requirement specifications

has a revolutionary effect not only on accurate reliability and maintenance

optimisation decisions but also on maintenance work itself, in that people can collect

the appropriate required data. Secondly, the methods developed here enable improved


identification of events necessary for maintenance models, and thus improve the

quality of reliability models identified from this information.

1.4 PUBLICATIONS

• Kazi Arif-Uz-Zaman, Michael E. Cholette, Fengfeng Li, Lin Ma,

Azharul Karim. (2015). A data fusion approach of multiple maintenance

data sources for real-world reliability modelling. In Proceeding of 10th

World Congress on Engineering Asset Management (WCEAM-2015),

September 28-30, 2015, Tampere, Finland.

• Kazi Arif-Uz-Zaman, Michael E. Cholette, Lin Ma, Azharul Karim.

(2017). Extracting failure time data from industrial maintenance records

using text mining. Advanced Engineering Informatics, 33, p. 388-396,

Q1 Journal, Impact Factor: 2, SJR: 1.26.

• Kazi Arif-Uz-Zaman, Michael E. Cholette, Yue Xu, Lin Ma, Azharul

Karim. (To be submitted in February, 2018) Failure time identification

from a minimum set of industrial maintenance records using text mining

and active learning, intended for publication at the Computers in

Industry, Q1 Journal, Impact Factor: 1.685, SJR: 0.93.

1.5 THESIS ORGANISATION

To ensure a complete discussion of the research problems and objectives,

existing literature, proposed methodology and conclusions, the thesis is arranged as

follows:

• Chapter 1 provides a general overview of the thesis including research

motivations, problems, objectives, contributions, innovations and

significance as well as the organisation of the thesis.

• Chapter 2 reviews the related literature regarding maintenance

optimisation models, industrial maintenance data structure and text

mining methods. This chapter also lists the information required for time-

based and condition-based maintenance optimisation models. Research

gaps are identified and discussed.


• Chapter 3 investigates the sufficiency of existing maintenance databases

to satisfy the data requirement of reliability and maintenance

optimisation models. Information requirement specification gives a

direction to maintenance practitioners for recording maintenance data

accurately, which can then be used in reliability and optimisation models.

• Chapter 4 proposes a novel method to extract failure and non-failure

maintenance time information using typically available maintenance

data. Initially, a text classifier is constructed using the free texts of one

maintenance database. The classifier is subsequently applied to another

database to categorise them into two classes: failure and non-failure.

Thus, the method jointly uses multiple databases to determine asset

failure time information. This proposed method can analyse the huge

quantity of historical maintenance data and make it useful to more

sophisticated analytics i.e., reliability and maintenance optimisation.

• Chapter 5 investigates the proposed method in two industrial settings.

Case studies suggest that the method can efficiently bridge the gap

between maintenance optimisation models and available maintenance

data by identifying failure time information. A group of different

validation methods is also tested for both the case studies. The results are

proved to be effective for identifying failure time information for

industrial assets.

• Chapter 6 develops an advanced text mining method, an alternative way

to identify failure time information using expert judgement. In this

regard, active learning based text classifiers are constructed by querying

a minimum number of maintenance data and labelling them by an expert.

This method is tested on two case studies and can accurately identify the

historical asset failure times.

• Chapter 7 summarises the conclusion of the thesis and suggests possible

future research directions.

Chapter 2: Literature Review 9

Chapter 2: Literature Review

This chapter reviews some of the significant literature on maintenance processes,

modelling techniques and optimisation. A critical limitation in the application of

maintenance optimisation is the gap between information required for the maintenance

optimisation model and the ways data typically is collected on the maintenance process

in many organisations. As will be seen later in this thesis, a possible remedy to this

gap is analysing the free text descriptions in different maintenance databases and

literature in text mining will thus also be reviewed. Three main research areas are

discussed here:

• Reliability and Maintenance Optimisation Models: Investigate various

models and techniques used for reliability and maintenance optimisation

under different maintenance strategies and policies. A particular

attention was paid to the data and information needed for these models.

• Data available in typically-collected maintenance databases:

Summarise the available industry databases and asset and maintenance

data collected during the maintenance process.

• Data and Text Mining: Review text mining literature regarding text

cleansing methods, features, text classifiers, supervised and semi-

supervised machine learning methods.

2.1 OVERVIEW OF THE MAINTENANCE PROCESS

Maintenance is the function that monitors and keeps plant, equipment and

facilities working. According to EN 13306: 2001 Maintenance Terminology,

“Maintenance is the combination of all technical, administrative and managerial

actions during the life cycle of an item intended to retain it in or restore it to a state in

which it can perform the required function” [9]. The recognition of the value of

maintenance is a recent development. Waeyenbergh and Pintelon [10] noted that

maintenance became an integrated part of other operating functions and the concept

had shifted from failure-based repair to use-based repair and eventually towards

condition based maintenance. Maintenance is a process whose activities are carried

out simultaneously with the production process [11, 12]. Figure 2-1 shows the

10 Chapter 2: Literature Review

relationship between different objectives relating to production and maintenance

processes.

Figure 2-1. Production and maintenance process [11]

Production systems (which is the focus of most maintenance literature) usually

convert inputs (raw materials, energy, workload, etc.) into a product that satisfies

customer needs. The maintenance system, as a mix of when-what actions, labour, and

spare parts, together with other resources, aims to maintain equipment in good working

order, i.e., the equipment is able to provide the appropriate level of production

capacity. In a maintenance system, feedback control, planning, and organisation

activities are very critical and strategic issues. The first of these deals with the

production system and control of maintenance activity (e.g., workload allocations,

spare parts management). The various actions which must be taken to control

production and maintenance activities and to resolve breakdowns must be planned in

advance whenever possible.

Clearly feedback control requires maintenance action in downtime periods or

during an unexpected breakdown, to put the plant back into working order. In

unexpected breakdowns the planning phase is skipped and the maintenance work is

carried out as soon as possible. This is breakdown/corrective maintenance. Definitive

maintenance work is scheduled in a previously planned stop period. Maintenance

activities are so numerous and complex that they require effective management and

well-structured organisation (see Figure 2-2).


Figure 2-2. Maintenance management workflow [13]

2.1.1 Maintenance Policies and Strategies

Maintenance strategies can be divided into two major types: proactive and

reactive (see Figure 2-3). Reactive or corrective maintenance (CM) occurs when the

asset breaks down resulting in unexpected shutdown and high maintenance cost.

Figure 2-3. Evolution of maintenance strategies [14]


On the other hand, proactive maintenance (PM) is scheduled in order to minimise

the impact of a sudden breakdown, usually consumes fewer resources than CM and

can be accommodated in the production plans. In fact PM can be as simple as cleaning

filters, lubricating and changing oil, thus preventing the failure of a critical component

that is costly and takes time to be delivered. Because the operation schedules and

environment change dynamically in the real world, PM can take place unnecessarily.

To ensure PM occurs only when needed, condition-based maintenance (CBM) has

been introduced [15]. This can either take the form of regular inspections to evaluate

the assets’ wear or of sensors streaming data to diagnostic software. Therefore

maintenance tasks can be triggered only when the wear reaches a certain level. It is

worth mentioning that CBM is included under the general category of proactive

maintenance. CBM is defined by EN 13306:2010 as “preventative maintenance that

includes a combination of condition monitoring and/or inspection and/or testing,

analysis and subsequent maintenance actions” [16].

2.2 MAINTENANCE OPTIMISATION

Maintenance aims to retain assets in their operational states [17] or to improve

system availability however, since maintenance incurs cost there is a need to optimally

balance the objectives. In maintenance theory, this is the critical problem of

maintenance optimisation. A recent study on maintenance policy and optimisation

models revealed that maintenance cost can reach between 15% to 70% of the total

production expenditure, or in many cases, might even exceed annual net profit [18].

Thus an appropriate and optimised maintenance policy is essential for the financial

health of asset-intensive businesses and organisations. The main question faced by a

maintenance manager is this: what maintenance actions to take, and when to take them,

to gain an appropriate level of production from an asset.

Optimal maintenance policies provide a deliberate plan of action which guides

maintenance management by seeking the optimal balance between the costs and

benefits of the maintenance, while taking all kinds of constraints into account [19]. In

almost all cases, maintenance benefits consist of saving on costs which would be

incurred otherwise (e.g., less failure costs). Early, maintenance policies were based

on the sole aim of reducing the maintenance cost itself, without considering other


factors which were equally important, such as reliability [20]. Much of the time,

minimising total maintenance cost will limit reliability to an unacceptable level in

practice. Therefore, to obtain the best performance and a balance between these aims,

total maintenance costs, reliability estimations, as well as other factors should be

considered simultaneously when devising maintenance policies.

A general optimisation problem can be classified into two major elements: the

objective function and the decision variable. The objective function is a mathematical

expression describing a relationship of the optimisation parameters or the result of an

operation that uses the optimisation parameters as inputs. A decision variable is a

quantity that the decision-maker controls. For example, the number of workers to

employ during the morning shift on a production floor may be a decision variable in

an optimisation model for labour scheduling. Minimising cost was reported as an

objective function in more than 70% of the studies [17]. In addition to minimising

costs, maximising availability and maximising production throughput, overall

equipment efficiency (OEE) optimisation objectives were identified by Horenbeek and

colleagues in [21]. Table 2-1 presents an overview of the elements (decision variables

and objective functions) necessary for maintenance optimisation.

Table 2-1. Elements of maintenance optimization considering different maintenance strategies [17, 21-24]

Maintenance

Policy

Maintenance Strategy

Proactive Reactive

Predictive Preventive Corrective

On-Condition Scheduled Run-to-Failure

Possible

Decision

Variables

Inspection Frequency PM Frequency

N/A

Maintenance

Threshold on condition

Maintenance Schedule

on time

Spare Parts (Reorder Level & Max. Stock Level)

Maintenance Priorities

Production (Buffer Size)

Possible

Objectives

Min. Cost; Max. Availability; Max. Throughput; Max. Profit, OEE

Determining how frequently assets should be maintained to achieve the best

possible solution is a continuing concern within the field. In cases where PM is

considered, the decision variable is the PM frequency. Depending on the maintenance


policy, PM frequency can be based on periodic, age-based or constant-interval.

However, when the system of interest incorporates CBM or opportunistic

maintenance, the decision variable is the maintenance threshold that triggers

maintenance actions. If information on assets degradation is not streamed by on-line

systems, inspections are needed to evaluate this deterioration. Thus, according to this

logic, inspection intervals were included as a decision variable in [23]. In addition, the

priority of maintenance tasks is also included as decision variable.

Spare parts management is an important component in the maintenance system

and has a considerable impact on cost and availability. Several attempts have also been

made to investigate the effect of production parameters on maintenance systems in

manufacturing settings [25].

In general, maintenance optimisation covers the following aspects [19, 20]:

• Model description of a technical system and its function: a modelling of

the deterioration of the system over time and the possible consequences

for the system.

• A description of the available information about the system and the

action open to the management.

• Objective function and an optimisation technique which helps in finding

the best balance.

According to Chen and Pham [26], a model is a description of a process, system,

or concept, in a simple and systematic way which usually involves an explicit

mathematical formalisation of the process being studied. In maintenance policy

optimisation, a model is a description of a process to analyse and determine the optimal

maintenance policy under pre-determined maintenance objectives and criteria. An

example has been made by Dekker [19] regarding the description of the available

information in an age replacement model. If, upon failure, any component is replaced

by an identical one, the information regarding failure cost, 𝑐𝑐𝑓𝑓, preventive replacement

cost, 𝑐𝑐𝑝𝑝 and hazard rate, 𝑟𝑟(⋅) is required. The key result of optimisation is to identify

values of the decision variables that either maximise or minimise the objective

function. The stochastic behaviour of systems is mainly represented by the system

reliability estimates: availability, mean time between failure (MTBF) and failure

frequency, and the system maintenance cost measures: maintenance cost rate and


discounted cost rate. Generally, an optimal maintenance policy may be the one which

either

• Minimises the system maintenance cost rate

• Maximises the system reliability estimates

• Minimises the system maintenance cost rate while the system reliability

requirements are satisfied, or

• Maximises the system reliability estimates when the requirements for the

system maintenance cost are satisfied.

The work on maintenance optimisation was initiated in the early 1960s [27, 28].

In the literature, the optimal maintenance models are based on different parameters

classified into various categories like information availability, single unit or multi-unit

systems, time event and state event and the model types are classified into those

dealing with optimality criteria, methods of solution and planning time [20].

Maintenance optimisation has been considered to fall into either of two kinds:

qualitative and quantitative. The former includes techniques such as total productive

maintenance (TPM) [20], reliability centred maintenance (RCM) [20], etc. while the

latter incorporates various deterministic/ stochastic models like the Markov decision

processes, Bayesian models, etc. There has been a long journey of maintenance

techniques evolution from corrective maintenance in 1940 to various operation

research (OR) models for maintenance and today to the proactive reliability-based

approach [14].

Sherif [29] classified the models according to the modelling of the deterioration

into: 1) deterministic Models, and 2) stochastic models. In the case of CBM-based

stochastic deterioration models, Alaswad and Xiang [22] produced further sub

classifications such as discrete-state deterioration, proportional hazard model (PHM)

and continuous-state deterioration type. A unique classification based on certainty

theory is mentioned by Ding and Kamaruddin [18] which is classified in terms of the

degree of certainty: certainty, risk, and uncertainty. To investigate the modelling

criteria and information requirement; different types of reliability, time-based and

condition-based maintenance models are discussed in the following sections.


2.3 FAILURE TIME MODELS

2.3.1 Virtual Age (VA) Model

Kijima [30] used the idea of virtual age process of a repairable system to develop

an imperfect repair model. If any system has the virtual age 𝑉𝑉𝑘𝑘−1 = 𝑦𝑦 immediately

after the (𝑘𝑘 − 1) th maintenance, the 𝑘𝑘 th failure time 𝑋𝑋𝑘𝑘 is distributed as,

Pr[𝑋𝑋𝑘𝑘 ≤ 𝑥𝑥|𝑉𝑉𝑘𝑘−1 = 𝑦𝑦] =𝐹𝐹(𝑥𝑥 + 𝑦𝑦) − 𝐹𝐹(𝑦𝑦)

1 − 𝐹𝐹(𝑦𝑦) 2-1

where 𝐹𝐹(𝑥𝑥) is the failure-time distribution of the system. The failure rate 𝜆𝜆(𝑡𝑡) of such

model can be expressed as [30]:

𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝐴𝐴𝑘𝑘 + 𝑡𝑡 − 𝐶𝐶𝑘𝑘) 2-2

where, 𝐶𝐶𝑘𝑘 are the maintenance times (for any maintenance process, PM or CM), 𝐴𝐴𝑘𝑘 is

called the effective age at time 𝑡𝑡 and 𝑉𝑉𝑘𝑘 = 𝐴𝐴𝑘𝑘 + 𝑡𝑡 − 𝐶𝐶𝑘𝑘 represents the virtual age at

time 𝑡𝑡. The effective age is the virtual age of the asset after the last maintenance action.

Using effective age 𝐴𝐴𝑘𝑘 and virtual age 𝑉𝑉𝑘𝑘, different maintenance effects can be

constructed using virtual age models [31]. In case of minimal repair (often referred to

as ABAO), the effective and the last maintenance time are same, i.e., 𝐴𝐴𝑘𝑘 = 𝐶𝐶𝑘𝑘 for all

𝑘𝑘 ≥ 1. The failure rate is then only a function of time, 𝜆𝜆(𝑡𝑡) (i.e. it follows a non-

homogeneous Poisson process) [2, 32]. The perfect repair or AGAN is modelled as a

complete resetting of the effective age to zero (i.e., 𝐴𝐴𝑘𝑘 = 0 for all 𝑘𝑘 ≥ 1) (as shown in

Figure 2-4). In such case, the failure rate is [2, 32]:

𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝑡𝑡 − 𝐶𝐶𝑘𝑘) 2-3

In between minimal and perfect repairs, the maintenance effects can be

modelled using imperfect repair; the maintenance effect is supposed to reduce the

virtual age by an amount proportional to the supplement of age accumulated since the

last maintenance. When both maintenance actions have the same effect then the

effective age, 𝐴𝐴𝑘𝑘 (i.e., the last maintenance action is presumed to reduce the operating

time [33]) can be written as [31, 32, 34], 𝐴𝐴𝑘𝑘 = (1 − 𝜌𝜌)𝐶𝐶𝑘𝑘. The failure rate after the

maintenance can be shown to be [33]:

𝜆𝜆(𝑡𝑡) = 𝜆𝜆(𝑡𝑡 − 𝜌𝜌𝐶𝐶𝑘𝑘) 2-4


Figure 2-4. Minimal, perfect and imperfect repair [35]

Pulcini [36] proposed a Bayes approach within a proportional age reduction-

power law process (PAR-PLP). Based on the initial failure rate as 𝜆𝜆1(𝑡𝑡) = 𝛽𝛽𝛼𝛼�𝑡𝑡𝛼𝛼�𝛽𝛽−1

,

the conditional failure rate can be expressed by the following [33, 36]:

𝜆𝜆𝑘𝑘(𝑡𝑡) =𝛽𝛽𝛼𝛼�𝑡𝑡 − 𝜌𝜌𝐶𝐶𝑘𝑘

𝛼𝛼�𝛽𝛽−1

2-5


Types of information used in the failure time models from the selected literature

have been represented in Table 2-2.

Table 2-2. Types of information used for TBM based Failure Time Models

Model/Method

Information Types Maintainable System/Unit

Failure/ Preventive Maintenance (PM) Times

Failure History Completeness

Maintenance Type CM with Stops CM with Delay PM

Arithmetic Reduction of Age (ARA)[31]

Arithmetic Reduction of Intensity (ARI)[34]

Inlet Header of Heat Exchanger

Failure Times, PM Times

Censored PM, CM with Stops or Delay

Water Pump Failure Times, PM Times

Left Truncated PM, CM with Delay

Change Point Detection[37]

Electronic Board Failure Times Complete CM with Delay

Bayes Approach[36] Cooler System in Power Plant


Censored PM,CM with Stops

Proportional Age Reduction (PAR)[33]

Cooler System in Power Plant


Censored PM, CM with Stops

Proportional Intensity (PI)[38]

Airplane Air Conditioning

Failure Times Complete CM with Stops

Reliability Assessment Framework[39]

Centrifugal Pump Failure Times Truncated CM with Stops

Failure Process Modelling (FPM)[40]

Light Commercial Vehicle

Failure Times Censored CM with Delay

The above discussion suggests that failure time models require effective age 𝐴𝐴𝑘𝑘,

historical maintenance times 𝐶𝐶𝑘𝑘 (i.e., failure times or CM, preventive maintenance

times or PM), maintenance effect 𝜌𝜌 and current age of the asset.

2.4 DEGRADATION MODELS

When a system condition is directly observable, stochastic deterioration models

are usually considered. Any system subject to a deterioration process has an increasing

failure rate and is usually considered for both corrective and preventative maintenance

(see Figure 2-5) [19].


Figure 2-5. Deterioration model

𝑋𝑋(𝑡𝑡) represents the deterioration state over time 𝑡𝑡. At the beginning (𝑡𝑡 = 0), the

system is said to be in as good as new state. Since, the system has an increasing failure

rate, it is considered for preventive maintenance till the failure level. It is important to

be mentioned that the inspection time can be chosen by either arbitrarily or imposed

by optimum maintenance decision. When the deterioration system exceeds a failure

level at 𝐿𝐿 , the system is said to be in a “failed” state. With a condition-based

maintenance strategy, one has to decide whether to replace the deteriorated system due

through preventative or corrective maintenance and choose the date for the next

inspection of the system.

Deterioration models can be classified into three types: proportional hazard

model (PHM), discrete state deterioration and continuous state deterioration [22]

models. Models have also been reviewed for both single and multi-unit systems.

Discrete state models are often formulated by a Markov model [41]. To relax the rather

strict conditions of the Markov process or to model partially available system

information, a semi-Markov model or a hidden Markov model may be used. To model

continuous state deterioration, three processes are widely used: Wiener, gamma and

inverse Gaussian processes [22]. In particular, gamma process models are widely

applied as a model of a monotonic degradation process. A homogeneous gamma

process can be formulated by Eq. 2-6 [42]:

𝑡𝑡𝛼𝛼,𝛽𝛽(𝑡𝑡) =1

𝛤𝛤(𝛼𝛼)𝛽𝛽𝛼𝛼𝑡𝑡𝛼𝛼−1𝑒𝑒−

𝑡𝑡𝛽𝛽 , 𝑡𝑡 > 0 2-6


In which, 𝛤𝛤(𝛼𝛼) = ∫ 𝑧𝑧𝛼𝛼−1𝑒𝑒−𝑧𝑧𝑑𝑑𝑧𝑧∞

0 denotes the gamma function with shape parameter

𝛼𝛼 > 0 and scale parameter 𝛽𝛽 > 0. To model heterogeneous degradation, an inverse

Gaussian process is usually suitable [22].

PHM is one of the most popular degradation paradigms. It was first introduced

by Cox in 1972 and since then it has become an important statistical regression model

[43]. PHM is commonly applicable to multivariate failure models while the system

deteriorates by the effects of covariates. This is an approach to model an asset’s hazard

using condition monitoring data. The PHM model can be formulated and is shown by

Eq. 2-7 [44]:

ℎ(𝑡𝑡) = 𝛽𝛽

𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1

exp[𝛾𝛾1𝑧𝑧1(𝑡𝑡) + 𝛾𝛾2𝑧𝑧2(𝑡𝑡) + ⋯+ 𝛾𝛾𝑚𝑚𝑧𝑧𝑚𝑚(𝑡𝑡)] 2-7

where, ℎ(𝑡𝑡) is the hazard rate of failure at time 𝑡𝑡, given the 𝑚𝑚 covariate values of

𝑧𝑧1(𝑡𝑡), 𝑧𝑧2(𝑡𝑡), …, 𝑧𝑧𝑚𝑚(𝑡𝑡). The baseline hazard ℎ0(𝑡𝑡) = 𝛽𝛽𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1

is that from the Weibull

model. Each 𝑧𝑧𝑖𝑖(𝑡𝑡) is a covariate or explanatory variable, representing a monitored

condition data item at the time of inspection 𝑡𝑡 , for instance, voltage, current,

temperature, humidity, or measure of stress. The product of 𝑧𝑧𝑖𝑖 and 𝛾𝛾𝑖𝑖 determines the

influence of covariates on the hazard rate of failure. According to the principles of

reliability analysis, the reliability and failure probability density can be estimated as

[43]:

𝑅𝑅(𝑡𝑡) = exp �−� ℎ(𝑡𝑡, 𝑧𝑧𝑖𝑖)𝑑𝑑𝑡𝑡𝑚𝑚

0� = exp �− �

𝑡𝑡𝜂𝜂�𝛽𝛽

exp(γi ∙ 𝑧𝑧𝑖𝑖)� 2-8

𝑡𝑡(𝑡𝑡) =𝛽𝛽𝜂𝜂�𝑡𝑡𝜂𝜂�𝛽𝛽−1

exp[𝛾𝛾1𝑧𝑧1(𝑡𝑡) + 𝛾𝛾2𝑧𝑧2(𝑡𝑡) + ⋯

+ 𝛾𝛾𝑚𝑚𝑧𝑧𝑚𝑚(𝑡𝑡)]exp �− �𝑡𝑡𝜂𝜂�𝛽𝛽

exp(γi ∙ 𝑧𝑧𝑖𝑖)� 2-9

According to Ahmad and Kamaruddin [15], condition monitoring data can be

classified into three types: the value type, waveform type, and multi-dimensional type.


Value type data exist in a single value, examples of which include oil analysis data,

temperature, pressure, humidity, and quality scale. Waveform and multi-dimensional

type data can also be referred to as signal and image forms, respectively. Examples of

signal forms include vibration and acoustic data, which are typical of waveform type

data. Image forms such as infrared thermographs, visual images and X-ray images are

examples of multi-dimensional type data. Value type CBM data can be exemplified by

the research of Moreira and Junior [8] who investigated the prognostics of the aircraft

bleed valve using a kind of condition monitoring data called the air management

system (AMS). This included cabin temperature, pressurization and air renewing and

cycling. Another example based on condition monitoring data is that of Bastos and

associates [45] applied a prediction algorithm to estimate the possibilities of machine

breakdown, which supported decisions about maintenance interventions.

The data required for degradation modelling is quite different from failure time

models. For the stochastic process models (Markov, Gamma), a direct condition

indicator (e.g. thickness of a protective coating) is used and a threshold on this quantity

is used to define failure. When such a direct observation is not available, hidden

Markov Models can be used to infer the unobservable condition from imperfect

observations (e.g. vibration data). In addition to these condition data (i.e. covariates),

PHM models require failure time information to identify the hazard rate parameters.

Yet, such failure times may also be needed for stochastic process models to define a

failure threshold if one cannot be developed from first principles. Thus, degradation

likely models require identification of failure times and key condition indicators for

modelling.

2.5 MAINTENANCE OBJECTIVES AND COSTS

Maintenance optimisation is either failure time or degradation model coupled

with cost model. Generally, the objective functions of maintenance optimisation are

the cost rates. For TBM, the cost rate can be computed as [46, 47]:

𝑇𝑇𝐶𝐶(𝑇𝑇) =

𝐶𝐶𝑐𝑐𝑐𝑐𝐹𝐹(𝑇𝑇) + 𝐶𝐶𝑝𝑝𝑐𝑐𝑅𝑅(𝑇𝑇)

∫ 𝑅𝑅(𝑡𝑡)𝑑𝑑𝑡𝑡𝑇𝑇0

2-10

where, 𝐶𝐶𝑝𝑝𝑐𝑐 is the preventive maintenance unit cost, 𝐶𝐶𝑐𝑐𝑐𝑐 is the corrective maintenance

unit cost, 𝑇𝑇 is the optimum time of replacement, 𝐹𝐹(𝑇𝑇) is the cumulative distribution


function, 𝑅𝑅(𝑡𝑡) is the reliability function and 𝑇𝑇𝐶𝐶(𝑇𝑇) is the total maintenance cost per

unit time. For CBM the cost rate may be computed as [47-49]:

𝑇𝑇𝐶𝐶(𝑑𝑑) =𝐶𝐶𝑝𝑝𝑐𝑐(1 −𝑄𝑄(𝑑𝑑)) + �𝐶𝐶𝑝𝑝𝑐𝑐 + 𝐾𝐾�𝑄𝑄(𝑑𝑑)

𝑊𝑊(𝑑𝑑)

=𝐶𝐶𝑝𝑝𝑐𝑐(1 − 𝑄𝑄(𝑑𝑑)) + 𝐶𝐶𝑐𝑐𝑐𝑐𝑄𝑄(𝑑𝑑)

𝑊𝑊(𝑑𝑑) 2-11

where 𝑑𝑑 is a threshold on the condition indicator, 𝐶𝐶𝑐𝑐𝑐𝑐 = 𝐶𝐶𝑝𝑝𝑐𝑐 + 𝐾𝐾 is the failure

replacement cost, 𝑄𝑄(𝑑𝑑) is the probability that a failure replacement will occur and

𝑊𝑊(𝑑𝑑) is the expected time until replacement (whether preventive or corrective).

For both TBM and CBM, maintenance cost rate is the total maintenance cost

over the cycle time. All of the models have total maintenance cost in common which

is composed of the following costs. In TBM, the total maintenance cost can be

calculated using Eqs. 2-12 and 2-13: [15]:

𝐶𝐶𝐶𝐶𝐶𝐶 = 𝐶𝐶𝑚𝑚,𝐶𝐶𝐶𝐶 + 𝐶𝐶𝑐𝑐 + 𝐶𝐶𝑑𝑑,𝐶𝐶𝐶𝐶 2-12

𝐶𝐶𝑃𝑃𝐶𝐶 = 𝐶𝐶𝑚𝑚,𝑃𝑃𝐶𝐶 + 𝐶𝐶𝑑𝑑,𝑃𝑃𝐶𝐶 2-13

where, 𝐶𝐶𝐶𝐶𝐶𝐶 is the CM cost (or, total failure cost), 𝐶𝐶𝑃𝑃𝐶𝐶 is the PM cost, 𝐶𝐶𝑚𝑚,𝐶𝐶𝐶𝐶 (𝐶𝐶𝑚𝑚,𝑃𝑃𝐶𝐶)

are the direct maintenance costs (e.g. labour and spare parts) when corrective

(preventive) maintenance is undertaken, 𝐶𝐶𝑐𝑐 is the product reject cost, or, cost of

product loss when the machine fails and 𝐶𝐶𝑑𝑑,𝐶𝐶𝐶𝐶 (> 𝐶𝐶𝑑𝑑,𝑃𝑃𝐶𝐶) is the downtime cost when a

corrective (preventive) maintenance action is taken.

Direct maintenance costs 𝐶𝐶𝑚𝑚 can mostly be obtained from maintenance logs.

Considering the overhead cost of each maintenance personnel and the cost of the parts

that are replaced during maintenance, direct maintenance costs are well calculated in

accounting records. However, the product reject and the downtime costs are mostly

related to the operational context. The product reject cost 𝐶𝐶𝑐𝑐 can be determined from

the production of the machines, depending on the production process (e.g. sequence or

parallel). The downtime cost 𝐶𝐶𝑑𝑑 can be calculated using the total downtime in the

production. So, product reject and downtime costs are usually approximated with an

understanding of the operation context and can be ascertained from the production

data.


For CBM, let 𝑑𝑑(𝑡𝑡) denote the time passed in a failed state in [0,t], then total the

maintenance cost 𝑇𝑇𝐶𝐶(t) can be shown to be [22]:

𝑇𝑇𝐶𝐶(𝑡𝑡) = 𝐶𝐶𝑐𝑐𝑐𝑐𝑁𝑁𝑐𝑐𝑐𝑐(𝑡𝑡) + 𝐶𝐶𝑝𝑝𝑐𝑐𝑁𝑁𝑝𝑝𝑐𝑐(𝑡𝑡) + 𝐶𝐶𝑖𝑖𝑁𝑁𝑖𝑖(𝑡𝑡) + 𝐶𝐶𝑑𝑑𝑑𝑑(𝑡𝑡) 2-14

where, 𝐶𝐶𝑐𝑐𝑐𝑐 is the corrective replacement cost, 𝐶𝐶𝑝𝑝𝑐𝑐 denotes the preventative

replacement cost, 𝐶𝐶𝑖𝑖 is the inspection cost and 𝑁𝑁𝑐𝑐𝑐𝑐, 𝑁𝑁𝑝𝑝𝑐𝑐 and 𝑁𝑁𝑖𝑖 represent the (random)

number of corrective repairs, preventive repairs and inspections respectively in [0,t].

2.6 ADVANTAGES AND DISADVANTAGES OF THE MODELS

Main advantage of TBM is that it requires maintenance (failure and preventive)

times only. Most of the TBM optimisation thus require historical failure times,

recorded throughout the life time of the asset [31, 33, 34, 36-40, 50] and total

maintenance cost. Although, the total maintenance costs are easily approximated, there

are still challenges in obtaining reliable failure time data. Due to incorrect recording,

such data are not always available, or sometimes unusable [15]. For instance, asset

maintenance may be planned (i.e. not reactive), which must be treated differently from

failure times in estimation. Without careful data recording practices, such data might

be confused with failure time data, which might lead to inaccurate estimation of

reliability models. Another challenge is the connection of maintenance data with actual

asset stoppages. In most cases, the failure time models [33, 34] assumed that a

maintenance events lead to immediate break-down (e.g., stoppage) of the asset.

On the other hand, the main advantage of CBM is that it is economically superior

in terms of total maintenance cost [51]. Unlike TBM which employs fixed

maintenance time interval, in CBM, maintenance is performed only when it is needed.

CBM helps to reduce maintenance setup cost and unnecessary maintenance actions

[19]. So, various degradation models and optimisation methods in CBM are preferable

if all the data (i.e., failure time data, condition data) are available. However, there are

challenges and difficulties in obtaining condition and failure time data. The major

challenge is data availability [15]. The condition data might be expensive to collect

and store, requiring specialized sensors (e.g. vibration or acoustic emission sensors)

and data acquisition equipment. Moreover, significant expertise and effort is needed

to develop degradation models. Systems that are subject to multiple degradation


process, caused by both internal and external failures, can create great mathematical

complexity in developing degradation modelling [19].

So, it is essential that, such required data are properly recorded in typically

available maintenance databases. However, in many organizations current practices,

maintenance data are not properly recorded and linked to identify failure events. It is

thus challenging to analyse industrial maintenance databases to identify failure times,

which are needed for both CBM and TBM.

2.7 TYPICALLY AVAILABLE MAINTENANCE DATABASES IN INDUSTRY

A maintenance documentation system for recording and conveying information

is an essential operational requirement for all the elements of the maintenance

management process. Maintenance documentation can be defined as [52]: any record,

catalogue, manual, drawing or computer file containing information that might be

required to facilitate maintenance work. Simultaneously, a maintenance information

system can be defined as [52]: the formal mechanism for collecting, storing, analysing,

interrogating and reporting maintenance information. This information could come

from a variety of data sources.

Nowadays, two main data collection systems are implemented in many

maintenance departments: the computer maintenance management system (CMMS)

and the condition monitoring system (CMS) (or the distributed control system (DCS)

database). The former is the core of traditional maintenance record-keeping practices

and often facilitates the usage of textual descriptions of faults and actions performed

on an asset. On the other hand, CMS/DCS data can be used to directly monitor asset

and component parameters.

Hong [53] mentioned three main sources of reliability data: laboratory life tests,

field tracking studies and a warranty database. Laboratory reliability testing is often

used to make product design decisions as to whether the “real” reliability data comes

from the field, often in the form of warranty returns or, specially designed field-

tracking studies [54]. Although warranty data is a very rich source of reliability

information, these data have common problems: failure mode information is

sometimes unavailable and censored in nature.


In many industries (e.g. production facilities), data on maintenance occurrence

and workflow are directly recorded. This is the second major source of reliability data,

where companies and organisations keep detailed records of the costs of maintenance

for their assets (e.g., information about the reliability for a fleet of automobiles or

locomotives or transformers).

However, maintenance data generally lack important engineering information

due to their reporting rules. In short, such databases were designed for financial

reporting and maintenance workflow control rather than answering engineering

questions. Additionally, challenges arise from several data collection difficulties

including accuracy, correctness, duplication, consistency, timeliness, validity,

reliability and completeness [55]. Recognising this, Moore and Starr [56] identified

production schedules and financial records as complements to CMMS and CMS/DCS

data to inform maintenance practices.

However, for an effective use of maintenance decision making, it is necessary to

have reliable and consistent data in maintenance databases [57]. Databases should

contain data related to equipment functioning, failures and their consequences as well

as maintenance operations and their costs, in order to optimise maintenance. The best

case would be if such information were collected from the same equipment (specific

failure data) or from analogous equipment in similar conditions. The analysis and

treatment of collected data will allow calculating and validating maintenance

optimisation models, and re-planning for production and maintenance operations or

actions.

Most process industry and manufacturing plants use CMMS databases to help

manage maintenance performed on plant assets. CMMS benefits include asset

information, maintenance work planning and scheduling, maintenance history and

maintenance reporting etc. One of the most widely available sources of maintenance

data is the so-called work order (sometimes called work notifications/event). This type

of data documents the history of all maintenance events that occur. The events may

include inspection, repair and replacement, and may be corrective or preventative

actions. Work orders typically include data related to maintenance planning,

scheduling, and execution with work descriptions. Describing work order

notifications, Sipos and colleagues termed these as log data which is a collection of


events recorded during various maintenance applications which have been run on the

equipment [6].

Outage/stop data is another source of information which can also be available

in CMMS and is one of the important sources to identify failure time information.

Generally asset stoppage information due to failure or planned outage is recorded here.

For example, Alkali et al. [58] used a plant information (PI) database to extract failure

information for a coal fired power plant, using the PI records of the motor’s current to

indicate the mill’s on or off status. Predicting the vehicle compressor failure

information, the service record (SR) database which contains repair information

including previous failure records has also been widely used [3].

A limited number of methods have been proposed to bridge the incompleteness

among maintenance databases. The most effective source of reliable information is

often in the free text descriptions (used to describe the repair process) in different

maintenance databases [5], which are difficult to quantitatively analyse. Text

classification methods provide a possible set of tools for analysing such free texts [59].

Text mining methods seek to execute a series of natural language processing (NLP)

steps to extract useful information [60]. Exploring potential linkages through analysing

and extracting numerical and text data from maintenance databases, via knowledge

discovery and machine learning methods, is investigated in the next sections.

2.8 KNOWLEDGE DISCOVERY

Every day, 2.5 quintillion bytes of data are created and 90 percent of the data in

the world today were produced within the past two years [61]. A fundamental

challenge is to explore large volumes of data and extract useful information or

knowledge from that. Data mining (DM) is the analysis of (often large) observational

datasets to find novel relationships and to summarise the data in novel ways that are

both understandable and useful to the data owner [62]. DM, a term also popularly

enhanced to refer to the term knowledge discovery in databases (KDD), is the

automated extraction of patterns representing knowledge implicitly stored in large

databases, data warehouses, and other massive information repositories [63] and has

become an increasingly important research area. Generally, the entire KDD process

follows the steps shown in Figure 2-6 starting from data selection to knowledge

acquisition through data processing and data mining [62, 64, 65].


Figure 2-6. Process of knowledge discovery in databases [64]

KDD is like an umbrella for all those methods that aim to discover relationships

and regularity among the observed data. It includes various stages, from the selection

of required datasets to the interpretation of results attained from the techniques applied.

KDD refers to the overall process of the stages of finding and discovering knowledge

from data, of which DM is one step, in the process, consisting of applying data analysis

and discovery algorithms. KDD comprises three general processing steps as shown in

Table 2-3. The first stage is data pre-processing, which entails data collection, data

smoothing, data cleansing, data transformation and data reduction.

Table 2-3. Data processing steps in KDD

Knowledge Discovery in Databases (KDD)

Data Pre-Processing

Data Mining

Data Post-Processing

The second step, normally called DM, involves data modelling and prediction.

DM can involve either data classification or prediction. The classification methods

include deviation detection, database segmentation, clustering (and so on); the

predictive methods include: (a) mathematical operation solutions such as linear

scoring, nonlinear scoring (neural nets), and advanced statistical methods like the

multiple adaptive regression by splines; (b) distance solutions, which involve the

nearest-neighbour approach; (c) logic solutions, which involve decision trees and

decision rules. The third step is data post-processing, which is the interpretation,

conclusion, or inferences drawn from the analysis in Step Two. So, KDD is a

multiphase process that includes business understanding, data preparation, modelling,


evaluation and deployment [66, 67]. According to the application and tasks, DM can

be categorised into eight distinctive branches (see Figure 2-7) [66]. One of the

important tasks of DM is text mining which is slightly different from the general KDD

process. Text mining can be defined as discovering useful information and knowledge

from textual databases through the application of data mining techniques.

Figure 2-7. Data mining tasks

2.9 TEXT MINING

Text mining (TM) is a particular type of DM that is focused on handling

unstructured or semi-structured datasets, such as text documents on paper, Excel

reports, web pages, messages, notes etc. So TM can be defined as textual data mining

or knowledge discovery from textual databases. Although the text mining process

relies heavily on applications of data mining techniques for discovering useful

knowledge, it is also focused on handling more unstructured data formats which pose

more challenges for pattern discovery than numerical data formats do.

Text documents usually consist of terms or keywords in sentences. The primary

step in TM is to cleanse the text documents by applying several conversion processes

[68]. To aid the DM methods on text data, keywords/terms present in each text

document are converted to term vectors. A term vector is an algebraic expression that


describes the relationship between text words and documents, and is commonly used

as a dataset for text-mining based analysis. In this method, each dimension of the

vector corresponds to an individual term, which can be a single word or keyword or

sometimes a longer phrase. If a specific document includes a specific term, the vector

value of that term should be more than zero. In using term vectors, which are based on

the keywords selected for text-mining analysis, it is imperative to consider how they

should be standardized. After text cleansing and term vector formulation, the keyword

can be stored in a keyword dictionary. Detailed discussions of text cleansing, text

features and machine learning algorithms and methods are presented in the following

sections.

2.10 TEXT CLEANING AND FEATURE EXTRACTION

Text cleaning (TC) represents the most time consuming phase in text mining, the

complexity of which depends on the data sources used. Text documents usually consist

of words or terms of sentences containing unwanted sparseness in the text corpus,

spelling variations, space, numbers, punctuation, and, most importantly,

discriminating words. Common causes of uncleansed data have been mentioned by

Low and associates in [69] and these are presented in Figure 2-8.

Figure 2-8. Raw text data with causes of errors and anomalies

TC helps to consider the most frequent words from large text documents by

removing and excluding the less frequent ones. This is usually done to remove

punctuation, numbers and other characters within texts that may clutter results. In this

section, a series of transformation functions to cleanse text data are discussed, which

are summarised in Table 2-4.


Table 2-4. Text cleaning technique with different transformation processes [68, 70, 71]

Function Purpose of the Transformation

tolower Transform all upper case letters to lower case

removeNumbers Remove all numbers

Stopwords

(language=‘english’)

Remove stop words commonly used in “English” language

myStopwords Remove “non-discriminating words that have negative effects on text

classification”

removeWords,

myStopwords

Remove words which are common, non-discriminating and mean

little to the model

removePunctuation Remove punctuation symbols

remove stripWhitespace Remove extra spaces

word stemming Reduce similar words into a single term

First, all the text documents are transformed into lower cases followed by

removing numbers, punctuation and extra spaces in between words. A common

practice when analysing text data is to remove filler words such as “to”, “and”,

“where”, “or”, “when” etc. These are known as stop words. The function Stopwords

(language=‘english’) shown in Table 2-4 considered 174 words which have been

excluded from the text documents. Apart from that, some keywords are considered to

be common but non-discriminating for both data types (failure and non-failure), and

were also excluded through the cleansing process. Specific examples and a list of those

keywords will be discussed in the case studies (Chapter 5).

Text classification is different from other classification methods because text

feature space is often sparse and high-dimensional. For instance, the dimensionality of

a moderate-sized text corpus can reach up to tens or hundreds of thousands. The high

dimensionality of feature space will cause the ‘‘curse of dimensionality’’, increase the

training time and affect the accuracy of classifiers. Therefore, feature extraction is

performed, which aims to reduce the dimensionality under the premise of guaranteeing

the performance of classifiers. The main idea is to select a subset of terms occurring

in the training set and use this subset as features in text classification. Two important

advantages of text features include:

• Reduces of the “feature space” dimensionality by choosing the most

valuable features


• Improvement of the performance of text classifiers

Commonly used text features include bag of words (BOW) [72] and language

model/ order of words/ N-Gram [73, 74] (see Figure 2-9). Most effective (for text

classification) and widely used features for both types are discussed in the following

sections.

Figure 2-9. Features commonly used for text classification

2.10.1 Bag of Words

Usually, a text is represented as a vector of weighted terms, involving a two

phase conversion process. Firstly, a vector space model [75], namely a bag of words

(BOW), is built, which covers all unique items occurring in the training corpus.

Secondly, the text is mapped into a feature vector based on both the BOW and the

contents of the text. A BOW representation scheme is widely used in text classification

due to its simplicity and efficiency. Under this scheme, documents are represented by

bags of terms, each term being an independent feature of its own.

A document can be represented as a vector. Each item in the vector corresponds

to an individual term and its value can be defined as a binary indicator or the absolute


frequency. Many features have been explored, among which are term frequency (TF),

term frequency-inverse document frequency (TF-IDF), information gain (IG), Chi-

square (CS) statistics, mutual information (MI), Gini-Index (GI) and expected cross-

entropy (ECE). Most of the methods use the frequency of every feature in the BOW

or assign score on the probability. After that a feature rank is made based on their

probability or score and finally the top features are selected. This means that features

have been assigned ranks based on their frequency. Finally, the top ranked features are

selected.

According to Yang and Pedersen [76], IG and CS outperform MI and ECE and

achieved better accuracy. Rogati and Yang [77] also conducted a comparative study

of feature selection methods (i.e., IG, CS and TF) for different text classification

algorithms (i.e., NB, SVM, k-NN) on two well-known datasets: Reuters 21578 and

Reuters Corpus version RCV-1. Their experimental results indicated that CS based

feature selection method outperformed other methods for classifiers and both the

datasets.

Comparing the performances of IG, CS and document frequency (DF) features,

Zhang and colleagues [78] conducted an experiments on the Spam filters:

SpamAssasin, LingSpam, PU1 corpora and Chinese corpus ZH1. It was verified by

their experiments that IG led to the best performance, followed by CS. Liu and

associates in [79] investigated experimental comparisons of four feature selection

methods: DF, CS, IG and gain ratio (GR) over five text classification algorithms: NB,

SVM, k-NN, radial basis function neural network (RBFNN) and decision tree (DT)

for multi-class sentiment classification. In terms of achieving best classification

accuracy within the shortest execution time, IG and GR outperformed other methods.

However, some have argued that the method using TF could achieve a

comparable performance to CS [80]. In response to such heterogeneous findings,

Cheng and colleagues [81] suggested using different methods for different applications

since good features should consider problem domain and algorithm characteristics.

Such key features are discussed below.

2.10.2 Term Frequency (TF)-Inverse Document Frequency (IDF)

Term frequency (TF) is the most general BOW feature which computes the

frequency or number of times a keyword/feature appears in the document. Unlike term


binary value (0 or 1), this method is more effective since it assigns more weights to

the frequent keyword than the rare ones [82]. It considers the repetition of a keyword

(𝑖𝑖) in the document 𝑗𝑗 [83] thus:

𝑡𝑡𝑖𝑖𝑖𝑖 = 𝑡𝑡𝑖𝑖𝑖𝑖 = frequency of keyword 𝑖𝑖 in document 𝑗𝑗 2-15

However, by counting the frequency of each keyword, TF normally assigns a

large weight to the common keywords and provides less weight to the unique ones.

This results in a weak text discriminating power. To avoid such a shortcoming, another

factor (called, inverse document frequency) is introduced along with the term

frequency method. TF-IDF measures the relative frequency of the keyword in a

specific document thorough an inverse proportion of that keyword over the entire text

documents. So for any keyword 𝑡𝑡𝑖𝑖, 𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡 in a document can be calculated as shown

by Eq. 2-16 [82, 84]:

𝑤𝑤𝑖𝑖𝑖𝑖 = 𝑡𝑡𝑡𝑡𝑖𝑖𝑖𝑖 × 𝑙𝑙𝑙𝑙𝑙𝑙 �

𝑁𝑁𝑑𝑑𝑡𝑡(𝑡𝑡𝑖𝑖)

� 2-16

where, 𝑡𝑡𝑡𝑡𝑖𝑖𝑖𝑖 is the frequency of the keyword 𝑡𝑡𝑖𝑖 in the document, 𝑑𝑑𝑡𝑡(𝑡𝑡𝑖𝑖) is the number

of documents containing 𝑡𝑡𝑖𝑖 and 𝑁𝑁 is the total number of documents in the entire text

corpus.

2.10.3 Chi-square (CS) Statistic

The CS statistic actually measures the lack of independence between feature 𝑡𝑡𝑖𝑖

and category 𝑐𝑐𝑖𝑖 in a training document and the critical values of this statistic can be

found using the χ2 distribution with one degree of freedom to judge extremeness [76].

The statistic is defined as [75]:

𝜒𝜒2(𝑡𝑡𝑖𝑖, 𝑐𝑐𝑖𝑖) =𝑁𝑁 × �𝑎𝑎𝑖𝑖𝑖𝑖𝑑𝑑𝑖𝑖𝑖𝑖 − 𝑏𝑏𝑖𝑖𝑖𝑖𝑐𝑐𝑖𝑖𝑖𝑖�

2

�𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖��𝑎𝑎𝑖𝑖𝑖𝑖+𝑐𝑐𝑖𝑖𝑖𝑖��𝑏𝑏𝑖𝑖𝑖𝑖 + 𝑑𝑑𝑖𝑖𝑖𝑖��𝑐𝑐𝑖𝑖𝑖𝑖 + 𝑑𝑑𝑖𝑖𝑖𝑖� 2-17

where 𝑁𝑁 is the total number of training documents and the sum of 𝑎𝑎𝑖𝑖𝑖𝑖 , 𝑏𝑏𝑖𝑖𝑖𝑖, 𝑐𝑐𝑖𝑖𝑖𝑖, 𝑑𝑑𝑖𝑖𝑖𝑖; 𝑎𝑎𝑖𝑖𝑖𝑖

is the frequency at which feature 𝑡𝑡𝑖𝑖 and category 𝑐𝑐𝑖𝑖 co-occur: 𝑏𝑏𝑖𝑖𝑖𝑖 is the frequency at

which the feature 𝑡𝑡𝑖𝑖 occurs which does not belong to category 𝑐𝑐𝑖𝑖; 𝑐𝑐𝑖𝑖𝑖𝑖 is the frequency

that the category 𝑐𝑐𝑖𝑖 occurs when it does not contain feature 𝑡𝑡𝑖𝑖 and 𝑑𝑑𝑖𝑖𝑖𝑖 is the number of


times neither 𝑐𝑐𝑖𝑖 nor 𝑡𝑡𝑖𝑖 occurs. However, the CS statistic is not reliable for low

frequency terms [85].

2.10.4 Information Gain (IG)

IG is generally employed to get the amount of information that a feature can

offer to the classification algorithm. It measures the amount of information obtained

for category prediction by determining/recognising the presence or absence of a term

in a document. Assuming {𝑐𝑐𝑘𝑘}𝑘𝑘=1𝐶𝐶 is the set of categories in the target space, then the

IG of term ti can be formulated and shown in Eq. 2-18 [76]:

𝐼𝐼𝑁𝑁(𝑡𝑡𝑖𝑖) = −�𝑝𝑝(𝑐𝑐𝑘𝑘)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶

𝑘𝑘=1

𝑝𝑝(𝑐𝑐𝑘𝑘) + 𝑝𝑝(𝑡𝑡𝑖𝑖)�𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶

𝑘𝑘=1

𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖)

+ 𝑝𝑝(𝑡𝑡�̅�𝑖)�𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖)𝑙𝑙𝑙𝑙𝑙𝑙𝐶𝐶

𝑘𝑘=1

𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖) 2-18

where 𝑝𝑝(𝑐𝑐𝑘𝑘) = 𝑎𝑎𝑖𝑖𝑖𝑖+𝑐𝑐𝑖𝑖𝑖𝑖𝑁𝑁

; 𝑝𝑝(𝑡𝑡𝑖𝑖) = 𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖𝑁𝑁

; 𝑝𝑝(𝑡𝑡�̅�𝑖) = 𝑐𝑐𝑖𝑖𝑖𝑖+𝑑𝑑𝑖𝑖𝑖𝑖𝑁𝑁

; 𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡𝑖𝑖) = 𝑎𝑎𝑖𝑖𝑖𝑖𝑎𝑎𝑖𝑖𝑖𝑖+𝑏𝑏𝑖𝑖𝑖𝑖

; 𝑝𝑝(𝑐𝑐𝑘𝑘|𝑡𝑡�̅�𝑖) =

𝑐𝑐𝑖𝑖𝑖𝑖𝑐𝑐𝑖𝑖𝑖𝑖+𝑑𝑑𝑖𝑖𝑖𝑖

The larger the value of the IG, the more informative the feature is. On the other

hand, it has a tendency to select non-discriminating terms which are common over

multiple categories [85].

2.10.5 Language Model

Language can be viewed as a stream of words. Due to syntactic and semantic

constraints, these words are not independent. A language model has been proposed to

catch this characteristic of natural language. N-Gram models are a kind of widely used

LM which assumes that the probability of word 𝑛𝑛 in a document depends on its

previous 𝑛𝑛 − 1 words. Given a word sequence 𝑊𝑊 = 𝑤𝑤1,𝑤𝑤2, … ,𝑤𝑤𝑈𝑈, the probability of

W can be calculated as:

𝑝𝑝(𝑊𝑊) = �𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤1, …𝑤𝑤𝑖𝑖−1)

𝑈𝑈

𝑖𝑖=1

2-19

Under the word dependency assumption, the only words relevant to predicting

𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤1, …𝑤𝑤𝑖𝑖−1) are the previous 𝑛𝑛 − 1 words. So Eq. 2-19 can be written as:


𝑝𝑝(𝑊𝑊) = �𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1)

𝑈𝑈

𝑖𝑖=1

2-20

In particular, an N-Gram model uses the previous 𝑛𝑛 − 1 words to predict the

next one by following Eq. 2-21:

𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) =

𝑝𝑝(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖)𝑝𝑝(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖−1)

2-21

𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) can be calculated from a text corpus using maximum

likelihood as shown by Eq. 2-22:

𝑝𝑝(𝑤𝑤𝑖𝑖|𝑤𝑤𝑖𝑖−𝑛𝑛+1, …𝑤𝑤𝑖𝑖−1) =

𝑐𝑐(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖)𝑐𝑐(𝑤𝑤𝑖𝑖−𝑛𝑛+1 …𝑤𝑤𝑖𝑖−1)

2-22

where 𝑐𝑐(⋅)denotes the number of occurrences. Generally, N-Gram means “sequence

of length 𝑛𝑛”. In this respect, one word sequence is called a Uni-Gram, a sequence of

two words (i.e., “repair leak”) is called a Bi-Gram, a three word sequence (i.e., “repair

pf leak”) is called a Tri-gram and so on. Although N-Gram was first mentioned by

Shannon in 1948 in the communication theory context, since then it has found

significant research attention in many applications, including text classifications.

Tripathy and associates in [86] attempted to classify movie reviews using the N-

Gram approach and found better accuracy if the N-Gram increases from Uni-Gram to

Bi-Gram. By analysing the efficiency of N-Gram on a social media dataset, Ogada and

colleagues found that a Bi-gram approach has a better accuracy compared to a Uni-

Gram or Tri-Gram one [87]. Interestingly, the accuracy of a mixed gram approach (i.e.,

combination of Uni-Gram, Bi-Gram and Tri-Gram) outperformed a classic N-Gram

one (i.e., Uni-Gram, Bi-Gram or Tri-Gram used exclusively) in many cases [86]. Pang

and Lee [88] found both Uni-Gram and a mix of Uni-Gram and Bi-Gram approaches

effective in sentiment analysis.

Saleh and associates in [89] conducted experiments with different N-Gram

features on three different corpora. The results of the N-Gram schemes suggested that

a Tri-Gram model outperformed Uni-Gram and Bi-Gram models in both 3-fold and

10-fold cross validation. However, in N-Gram the number of model parameters grow

exponentially with N and thus patterns tended to be more sparsely distributed across

the documents, which ultimately adversely affected the classification performance

[74].


2.11 TEXT CLASSIFICATION ALGORITHMS

Text classification (TC) is the task of automatically sorting a set of documents

into categories from a predefined set. Machine learning techniques are commonly

applied to construct a classification model from training documents with known class

labels. The constructed model can then be used to classify new documents. The

booming interest in TC the last decades is due to [87]:

• Increased availability of documents in digital formats

• Considerable savings in terms of expert labour since no intervention is

required from either a knowledge engineer nor domain experts

The main purpose of TC is to train a classifier which performs the category

assignments automatically. These techniques have been used in many applications

including email filtering [75, 90], topic categorisation [90], document indexing and

clustering [91]. Many classification algorithms are proposed for TC, e.g., Naïve Bayes

(NB) [75, 90, 92], support vector machine (SVM) [85, 91, 93-95], k-nearest neighbour

(k-NN) [85, 94], decision trees and artificial neural network (ANN) [95].

Joachims [94] showed that SVM scales well, has a good performance on large

datasets and outperforms NB and k-NN substantially. Nevertheless, with efficient data

pre-processing, a k-NN algorithm was found to be able to achieve good classification

performance and scaled up well with the number of documents used for the

investigation [96]. In similar research, Basu and associates compared SVM with ANN.

Their results showed better performance of SVM in a reduced feature set [95]. Ozgur

and colleagues [97] investigated spam filtering using ANN and NB on Turkish

messages. For a small feature set, the binary representation produced better

performance for the NB classifier.

Yu and Xu [98] compared the performances of four classification algorithms:

NB, NN, SVM and relevance vector machine (RVM), considering different training

and feature sizes of spam filtering corpora. Experimental results showed that SVM and

RVM outperformed NB while NN was not suitable for such classification. Using k-

NN, SVM and NB algorithms applied to different parts of the email (i.e., header,

subject and body) for classifications purpose, Lai [99] concluded that NB and SVM

yielded better performances than k-NN. However, noisy and non-informative body

features caused poor performance (compared to SVM) for the NB classifier.


Importantly, SVM with a TF-IDF approach outperformed all other techniques and this

was suggested to be the best classifier and feature combination for spam filtering and

email classification.

Webb and colleagues in [100] used a large-scale corpus (assembled by the

authors with more than 1 million messages) to evaluate four spam filters: SpamProbe,

SVM, regression-based boosting and NB. Their experiment verified that SpamProbe,

SVM, regression-based boosting performed similarly, followed by NB. In another

experiment on spam filtering, Zhang and associates in [78] found that SVM, AdaBoost

and logistic regression (LR) attained the best performance while NB and the lazy

learning approach (for example, k-NN) were not feasible in a cost-sensitive scenario.

In general, SVM and boosting found to be slower in a large training dataset but faster

in classifying the new one [101].

Pang and Lee [88] classified a polarity dataset using machine learning

algorithms (i.e., NB, SVM and ME) and N-Gram features (i.e., Uni-Gram, Bi-Gram

and both). Their findings implied that SVM worked well when using Uni-Gram and

both Uni-Gram and Bi-Gram together. Yadav [102] reviewed various classification

techniques and identified that in most cases, SVM performed well over NB. However,

in a small feature set, NB might be effective in some applications.

Recently, Tripathy and associates in [86] examined the accuracies of four

machine learning algorithms: SVM, NB, ME and stochastic gradient descent (SGD)

on human sentiment classifications and found SVM to be the most effective.

Conducting experiments on a multi-class sentiment analysis problem on three public

datasets, Liu and colleagues [79] achieved the best classification accuracy using an

SVM classifier. Moreover, several other investigations [103-106] on classification

performances revealed that SVM is the best classification algorithm and performs well

in the high feature space.

Liu and associates in [107] developed a conditional random field (CRF) based

information extraction approach to semantically model the labelled data dependency.

The experimental results indicated an improved performance over baseline models. In

statistical machine translation problem, Lavergne and colleagues [108] proposed a new

approach by adapting a sequence labelling tasks though CRF. However, CRF

algorithm is more effective for information extraction.


Considering the usability of the text classifier for maintenance data and its

effectiveness on classification performance and computational speed, the TC

algorithm can be classified into two main approaches: discriminative methods (i.e.,

SVM and logistic regression) and probabilistic methods (NB and maximum entropy)

(see Figure 2-10).

Figure 2-10. Commonly used classification algorithms for text classification

Among these, five most common classification algorithms: NB, ME, CRF, k-

NN and SVM, are discussed in more detail in the following sections.


2.11.1 Naïve Bayes

A Naïve Bayes (NB) classifier is used to find the joint probabilities of words and

classes within a set of free text. The probability of a class A for a given text field B

can be calculated by using Bayes’ law:

𝑃𝑃(𝐴𝐴|𝐵𝐵) =

𝑃𝑃(𝐵𝐵|𝐴𝐴)𝑃𝑃(𝐴𝐴)𝑃𝑃(𝐵𝐵) =

𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)𝑃𝑃(𝐵𝐵)

2-23

Since 𝑃𝑃(𝐵𝐵) is constant for all classes, only the other variables need to be

maximised. It is assumed that classes are independent of each other (the naïve

assumption). The classification task is done by considering prior probability

information and the likelihood of the incoming information to form a posterior

probability model of classification. The NB model is effectively applied for [90]:

• Text classification such as email filtering, topic categorisation etc.

• Problems in which the information from numerous attributes should be

considered simultaneously in order to estimate the probability of an

outcome

The NB classifier is typically trained on data with categorical features. A sparse

matrix indicates the frequency of the appearance of each keyword in the bag of words

for each class in the training data. So when any new text comes, the classifier trained

on the NB is used to derive the class probability of new text given the text which

appears in the training data.

2.11.2 Maximum Entropy

Maximum entropy (ME) is a general purpose model mostly applied to text

processing tasks such as text classification. Given a training sample 𝒯𝒯 =

{(𝑑𝑑1, 𝑐𝑐1), (𝑑𝑑2, 𝑐𝑐2), … (𝑑𝑑𝑁𝑁 , 𝑐𝑐𝑁𝑁)}, the maximum entropy value in terms of exponential

function form can be calculated as shown in Eq. 2-24 [78, 86]:

𝑃𝑃(𝑐𝑐|𝑑𝑑) =

1𝑍𝑍(𝑑𝑑) 𝑒𝑒𝑥𝑥𝑝𝑝 ��𝜆𝜆𝑖𝑖𝑡𝑡𝑖𝑖(𝑑𝑑, 𝑐𝑐)

𝑛𝑛

𝑖𝑖=1

� 2-24

where 𝑑𝑑𝑖𝑖 is the feature vector, 𝑐𝑐𝑖𝑖 is the target class, 𝑃𝑃(𝑐𝑐|𝑑𝑑) is the probability of

document 𝑑𝑑𝑖𝑖 belonging to the class 𝑐𝑐𝑖𝑖, 𝑡𝑡𝑖𝑖(𝑑𝑑, 𝑐𝑐) is the feature/class function for feature

𝑡𝑡𝑖𝑖 and class 𝑐𝑐, 𝜆𝜆𝑖𝑖 is the estimated parameter and 𝑍𝑍(𝑑𝑑) is the normalizing factor.


2.11.3 Conditional Random Fields

It is widely used for sequence labelling, especially effective for information

extraction application. The model is generally useful in,

• Representing the natural language dependency

• Preventing label bias issue

Given an input sequence 𝑥𝑥 ∈ 𝒳𝒳𝑛𝑛 , label sequence 𝑐𝑐 ∈ 𝒞𝒞𝑛𝑛 , the first order

conditional probability of the label can be determined by Eq. 2-25 [108, 109]:

𝑝𝑝(𝑐𝑐|𝑥𝑥;𝜃𝜃) =

1𝑍𝑍𝑥𝑥

exp��𝜃𝜃𝑖𝑖𝑡𝑡𝑖𝑖(𝑐𝑐𝑖𝑖, 𝑐𝑐𝑖𝑖+1, 𝑥𝑥, 𝑖𝑖)𝑖𝑖𝑖𝑖

� 2-25

here, 𝑡𝑡𝑖𝑖 is the feature function and 𝑍𝑍𝑥𝑥 is partition function. The output sequence is

usually determined using Viterbi and forward-backward variants [110]. Using the

estimated training data distribution, the log likelihood of the training data can be

determined [109]:

ℒ(𝜃𝜃) = 𝐸𝐸𝑝𝑝�(𝑥𝑥,𝑐𝑐)[log 𝑝𝑝(𝑐𝑐|𝑥𝑥;𝜃𝜃)] 2-26

2.11.4 K-Nearest Neighbour

This is an instant based learning algorithm, used to classify data by employing

distance measures [79]. Euclidean distance is typically used in k-NN to compute the

distance between the text data [83] and the distance function can be formulated by Eq.

2-27 [70]:

𝐷𝐷(𝑎𝑎, 𝑏𝑏) = ��𝑎𝑎𝑖𝑖 − 𝑏𝑏𝑖𝑖�

2𝑘𝑘

𝑖𝑖=1

2-27

where 𝑎𝑎𝑖𝑖 and 𝑏𝑏𝑖𝑖 are the two keywords in Euclidean space of a text data. In a training

set, k closest samples (among text data) are considered along with their categories [70].

In the classification phase, distances among new text samples with all stored training

set data are measured and k closest samples are chosen by Eqs. 2-28 and 2-29 [79,

111]:


arg max𝑖𝑖� sim�𝐷𝐷𝑖𝑖|𝐷𝐷� ∗ 𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖�

𝑘𝑘

𝑖𝑖=1

2-28

𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖� = �

1, if the class of text 𝐷𝐷𝑖𝑖 is 𝑖𝑖0, if the class of text 𝐷𝐷𝑖𝑖 is not 𝑖𝑖

2-29

where 𝐷𝐷𝑖𝑖 is the similarity distance between 𝑗𝑗 = 1, 2, . . ,𝑘𝑘 keywords in the text data,

sim�𝐷𝐷𝑖𝑖|𝐷𝐷� is the similarity between text 𝐷𝐷𝑖𝑖 and training text 𝐷𝐷 and 𝛿𝛿�𝐶𝐶�𝐷𝐷𝑖𝑖�, 𝑖𝑖� is the

indicator of the training text 𝐷𝐷𝑖𝑖 with respect to class 𝑖𝑖.

The k-NN method is particularly effective in [111],

• Non-parametric features

• Classification tasks with multi-categorised documents.

However, when the size of the training set increases, k-NN causes major

computational time. Moreover, the performance of the classifier built from k-NN is

often affected by irrelevant and noisy features present in text documents.

2.11.5 Support Vector Machine

SVM classifiers were originally developed by Cortes and Vapnik [93] and

applied to spam filtering. SVM acknowledges some key properties that are prevalent

in text [94]:

• High dimensional feature spaces

• Few irrelevant features (dense concept vector), and

• Sparse instance vectors

An SVM seeks a hyperplane in a dataset that best separates the classes of the

data. The aim of SVM is to orient this hyperplane in such a way as to be as far as

possible from the closest members of all classes (i.e. to maximise the margin). Suppose

a training dataset contains d-dimensional features 𝐱𝐱𝑖𝑖 ∈ 𝑅𝑅𝑑𝑑 with class labels 𝑐𝑐𝑖𝑖 ∈

{−1,1}. A binary classifier 𝑡𝑡(𝐱𝐱) can be expressed as Eq. 2-30,

� 𝑡𝑡

(𝒙𝒙𝑖𝑖) ≥ 0 for 𝑐𝑐𝑖𝑖 = +1𝑡𝑡(𝒙𝒙𝑖𝑖) < 0 for 𝑐𝑐𝑖𝑖 = −1 2-30


In a traditional SVM, this function is parameterized by a hyperplane, 𝑡𝑡(𝐱𝐱) = 𝒘𝒘 ∙

𝐱𝐱 + 𝑏𝑏 and the class labels can be predicted according to the feature vector as shown in

Eqs. 2-31 to 2-33:

𝐱𝐱𝑖𝑖 ∙ 𝒘𝒘 + 𝑏𝑏 ≥ +1 − 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = +1 2-31

𝐱𝐱𝑖𝑖 ∙ 𝒘𝒘 + 𝑏𝑏 < −1 + 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = −1 2-32

𝜉𝜉𝑖𝑖 ≥ 0 ∀𝑖𝑖 2-33

where 𝒘𝒘 is called the weight coefficient vector, 𝑏𝑏 is the bias of the hyperplane and 𝜉𝜉𝑖𝑖

is a positive slack variable to allow for (some) misclassifications. The combined form

can be formulated and shown by Eq. 2-34:

𝑐𝑐𝑖𝑖(𝐱𝐱 ∙ 𝒘𝒘 + 𝑏𝑏) ≥ 1 − 𝜉𝜉𝑖𝑖 where 𝜉𝜉𝑖𝑖 ≥ 0 ∀𝑖𝑖 2-34

The hyperplane is however able to produce only plane boundaries, which are

often insufficient in non-linear situations. However, choosing an appropriate non-

linear transformation (i.e. the “kernel trick”), it is possible to use the same approach to

define more general boundaries. This kernel nonlinearly maps samples into a higher

dimensional space [112] and, unlike the linear kernel, it can handle the case when the

relation between class labels and attributes is nonlinear.

𝜙𝜙(𝐱𝐱𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏 ≥ 1 − 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = +1 2-35

𝜙𝜙(𝐱𝐱𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏 < −1 + 𝜉𝜉𝑖𝑖 for 𝑐𝑐𝑖𝑖 = −1 2-36

The combined form can be termed as decision valued (𝐱𝐱) in Eq. 2-37:

𝑡𝑡(𝐱𝐱) = 𝜙𝜙(𝐱𝐱).𝒘𝒘 + 𝑏𝑏 2-37

According to the definition of SVM [93, 113], the optimal value of 𝒘𝒘 and b can

be found using the optimization problem in Eq. 2-38:

𝑚𝑚𝑖𝑖𝑛𝑛12

‖𝒘𝒘‖2 + 𝐶𝐶�𝜉𝜉𝑖𝑖

𝑁𝑁

𝑖𝑖=1

s. t. 𝑐𝑐𝑖𝑖(𝜙𝜙(𝒙𝒙𝑖𝑖) ∙ 𝒘𝒘 + 𝑏𝑏) ≥ 1 − 𝜉𝜉𝑖𝑖 ∀𝑖𝑖 2-38


where C is a regularization parameter that penalizes misclassifications. In this

formulation, a minimum of ‖𝒘𝒘‖ corresponds to the maximum distance between the

boundary and the training points of the two classes. The optimisation problem can be

simplified using the Lagrangian function while the optimal value of 𝒘𝒘 can be shown

by Eq. 2-39,

𝒘𝒘 = �𝑐𝑐𝑖𝑖𝛼𝛼𝑖𝑖𝜙𝜙(𝐱𝐱𝒊𝒊)

𝑁𝑁

𝑖𝑖=1

2-39

where 𝛼𝛼𝑖𝑖 are the Lagrange Multipliers. By combining Eqs. 2-37 and 2-39 and defining

the kernel function 𝐾𝐾�𝐱𝐱𝒊𝒊, 𝐱𝐱𝒋𝒋� = 𝜙𝜙𝑇𝑇(𝐱𝐱𝒊𝒊)𝜙𝜙(𝐱𝐱𝒋𝒋) , the kernel based SVM can be

formulated as shown by Eq. 2-40:

𝑡𝑡(𝐱𝐱) = �𝛼𝛼𝑖𝑖𝑐𝑐𝑖𝑖𝐾𝐾(𝐱𝐱, 𝐱𝐱𝒊𝒊)

𝑁𝑁

𝑖𝑖=1

+ 𝑏𝑏 2-40

The most commonly used kernel functions are the linear, polynomial and radial

basis function (RBF) kernels. In this research, the well-known RBF kernel has been

used [8],

𝐾𝐾�𝐱𝐱𝒊𝒊, 𝐱𝐱𝒋𝒋� = exp (−1

2𝜎𝜎2�𝐱𝐱𝒊𝒊 − 𝐱𝐱𝒋𝒋�) 2-41

In general, it is widely agreed in classification literature that the RBF kernel is a

reasonable first choice. It allows a trade-off between the complexity of decision

boundaries and the generalization capabilities [114] of the classifier, which are tuned

to the cross-validation of γ and c.

2.12 PERFORMANCE EVALUATION

To obtain an accurate assessment and performance for a prediction classifier, it

must be tested on data that was not used for training. Machine learning algorithms are

typically evaluated by accuracy, recall and precision. Different performance

evaluation metrics as well as how the metrics can be calculated from the outcomes of

machine learning models have been shown in Table 2-5. TP, FP, FN and TN are the

counts of the documents.


Table 2-5. Performance evaluation metrics

Prediction Actual

In the Class Not in the Class In the Class True Positive (TP) False Positive (FP) Not in the Class False Negative (FN) True Negative (TN)

According to the definition in [86], “precision” is the ratio of the number of

documents correctly labelled as positive to the total number of positively classified

documents and “recall” is the ratio of the total number of positively labelled documents

to the total number of documents that are truly positives. “Accuracy” can be calculated

as the ratio of correctly classified documents to the total number of documents. F-

Measure is the harmonic mean of precision and recall. Such performance measures

can be formulated and shown by Eqs. 2-42 to 2-45:

Recall =

𝑇𝑇𝑃𝑃𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑁𝑁

2-42

Precision =

𝑇𝑇𝑃𝑃𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑃𝑃

2-43

Accuracy =

𝑇𝑇𝑃𝑃 + 𝑇𝑇𝑁𝑁𝑇𝑇𝑃𝑃 + 𝐹𝐹𝑁𝑁 + 𝐹𝐹𝑃𝑃 + 𝑇𝑇𝑁𝑁

2-44

F − Measure =2 × Precision × Recall

Precision + Recall 2-45

2.13 SUPERVISED MACHINE LEARNING

Supervised machine learning techniques typically label new events, based on the

given labelled examples. Such learning algorithms deduce event properties and

characteristics from training data, and use these to generalize to unseen situations. To

classify textual data into predefined classes it is necessary to partition a labelled set of

training data into different classes to test the performances of the classifiers. A

categorical attribute is set as a class attribute or target variable. The given data is

therefore first divided into pre-defined classes to interpret the terms defined in the

textual databases. A framework for the supervised machine learning text classification

method is presented in Figure 2-11. It can be seen that the supervised process has two


steps: training and prediction. In the first process, text data is usually pre-processed

through a text cleaning method and then converted into vector representation through

different features. The text cleaning method is usually applied to reduce the number of

features while features extraction methods convert the text data into vectors where

each vector represents a keyword in the text data (details of these processes have been

discussed in Section 2.10).

Figure 2-11. A framework for supervised machine learning text classification

After that, a text classifier is constructed using different text classification

algorithms, which have been elaborated on in Section 2.11. However, the most

important information required to construct a classifier is the “labelled text data”.

Supervised machine learning methods largely depend on such information. Finally, the

constructed classifier can be applied to the unseen new text data to predict their classes

(as shown in Figure 2-11).

Most organisations use maintenance logs or work orders to keep records of all

maintenance and repair activities performed on the machine or asset. Among different

data fields, the most significant and consistently filled in information is maintenance

work descriptions. A few studies have analysed such free text descriptions in industry

maintenance logs and/or work orders. Devaney and associates in [4] proposed


analysing the free texts from maintenance logs using domain ontology. The authors

allowed a system to learn over a large set of unlabelled data using a small subset which

was manually labelled and then the learned categories were used to construct a case

library database of maintenance patterns. These patterns were used by a CBR engine

to predict future failures as well as allowing more efficient troubleshooting and

diagnosis in the event of a failure. While the authors proposed an analysis framework

based on the construction of a case library, no case study was presented on real world

data.

Edwards and colleagues [7] categorised maintenance logs using a clustering

algorithm on a small data subset, manually labelling the data as failure or non-failure,

based on an expert opinion. Sipos and colleagues used operational logs and component

replacement data (and assumed that each replacement constituted a failure) to

construct a classifier that can anticipate the imminent failure of the equipment [6].

Developing failure and maintenance data extraction methodology, Alkali and

associates in [58] utilized hourly readings of motor current to determine whether the

mills were running or not and assumed all downtime was related to failure. An

exploratory data analysis was conducted using mill up time and downtime. However,

their analysis lacked any distinction between planned and unplanned downtime,

although the latter is the most relevant to downtime due to failure events.

A few attempts have been made to analyse work orders that have been generated

for condition based maintenance policy. For instance, Moreira and Junior [8] proposed

a method of performing prognostics on aircraft components based on an SVM

classification algorithm. Flight data and maintenance logs were used to classify the

training data into healthy and unhealthy states. The degradation index was finally

created from the classification result to prepare a future schedule of aircraft

maintenance. Bastos and associates [45] developed statistical data extraction methods

to extract failure-related information from their chosen datasets: equipment condition

monitoring data and maintenance data (containing both corrective and preventive

maintenance). The prediction model was able to forecast future failure based on the

existing maintenance records and also to estimate the possibility of machine

breakdown. Prytz and colleagues in [3] proposed a data-driven method of predicting

future failures of the air compressor of a commercial vehicle. The method was derived

from available warranty and vehicle maintenance log data and combined pattern


recognition with a remaining useful life (RUL) prediction to estimate the vehicle repair

work.

Although these text classification methods have been successfully used for

relevant purposes relevant to this study, in real industrial cases, there is always a

scarcity of labelled data to train such useful classification methods. A supervised

learning method uses only labelled training data, however many semi-supervised

learning methods employ a large amount of unlabelled data along with some labelled

ones to train classifiers and induce a better performance. [115]. This will now be

discussed in more detail as it is highly relevant to the real life industrial contexts of the

current study.

2.14 SEMI-SUPERVISED MACHINE LEARNING

Text classification methods typically employ supervised learning approaches

and so are reliant on the quality of the labelled historic data used to train them. Though

labelled data are expensive to obtain because it often involves human annotators, a

large number of unlabelled data are easy to get [115]. In such cases, the well-known

technique of semi-supervised learning (SSL) can make use of both labelled and

unlabelled data to learn (i.e. train) a classifier efficiently [116]. Popular SSL methods

include: active learning [117, 118], self-training [118, 119], co-training [116],

transductive support vector machine (TSVM) and graph-based method [120]. In order

to reduce the manual work load, two SSL methods are widely used: active learning

tries to overcome the labelling bottleneck by asking queries to the oracle [121] while

self-training aims to label samples by the classifier itself. The details of active learning

and semi-supervised self-training will be discussed in the following sub-sections.

2.14.1 Active Learning

Some studies have made use of such unlabelled data to improve text

categorisation through active learning (AL) [119]. AL is an iterative machine learning

process that can be used to build a text classifier by selecting from a set of larger

unlabelled data only the most informative samples from labelling by an expert [117],

which can be particularly advantageous when labelling is computationally expensive.

AL attempts to select informative samples to maximise the accuracy of the text

classifier with less training data. When the training data is available, through a process

of having selected random samples and training the classifier on these, the learning is


called passive learning (see Figure 2-12). In passive learning, the classifier is thus built

using the given training data in advance. However, acquiring training data in advance

is difficult and time consuming. In this regard, AL is effective since it queries only the

informative samples and constructs the classifier accordingly.

Figure 2-12. General schema for passive and active learning [122]

Several studies show that AL greatly reduces the labelling efforts in various

applications, including text categorisation [116-119], sentiment analysis [123], image

recognition [124, 125], and high-dimensional boundary identification [126]. In an

active learning framework, the active learner is initially trained with a small amount

of labelled training data and with the access to a large set of unlabelled data. Using the

trained model, new informative samples from the pool of unlabelled data are selected

for labelling. The selected data samples are subsequently added to the labelled training

data and the learner is re-trained. This iterative process is repeated until the stopping

criterion has been met [127].

Thus, the major considerations in AL are to choose the selection strategy by

which the learner may be able to ask the queries, which includes membership query

synthesis, stream-based and pool-based AL [121]. Figure 2-13 illustrates these three

different selection strategies in active learning. Membership query synthesis requests

the labels of unlabelled samples in the input space that the learner generates de novo

or it starts from the beginning.


Figure 2-13. Three main active learning query selection strategies [121]

In stream-based AL, one unlabelled sample is considered at a time and the

learner decides whether or not to query its label and send this query to the oracle. On

the other hand, pool-based AL considers a large unlabelled pool of samples, ranks

them based on a selection criterion and selects a number of the best samples. Stream-

based AL makes the query decision by individually processing every datum while

pool-based AL evaluates and ranks the entire unlabelled dataset before selecting the

best query.

In all cases, one must select the most informative unlabelled sample/s. There

have been many proposed ways of formulating such query strategies in the literature,

including uncertainty sampling, query by committee, expected model change,

expected error reduction, variance reduction, and density weighted method [121, 128,

129].

The most common of these is uncertainty sampling, which was first introduced

by Lewis and Gale [130]. The key idea is that the samples that the text classifier is

most uncertain about provide the greatest insight into the underlying data distribution

and should be selected for labelling. In theory, AL is possible with any classifier that

is capable of passive learning. Since SVM is proven to provide highly accurate results

in the passive learning scenario (see Section 2.11.5), this will be utilized as a current

classifier for active learning. Given that, labelled training data �𝐱𝐱i,𝑦𝑦𝑖𝑖� and the centre

of the largest hypersphere 𝑤𝑤𝑖𝑖 which can fit inside the current version space 𝛾𝛾𝑖𝑖 , the

position of 𝑤𝑤𝑖𝑖 clearly depends on the shape of the region of 𝛾𝛾𝑖𝑖 . Now, for each


unlabelled example 𝐱𝐱� can be tested to see how close their corresponding hyperplanes

are positioned with the centrally placed 𝑤𝑤𝑖𝑖. The closer a hyperplane to the point 𝑤𝑤𝑖𝑖,

the more centrally is placed in the version space. Following Eq. 2-37, the shortest

distance between hyperplanes for each unlabelled example and the vector 𝑤𝑤𝑖𝑖 is simply

the distance between feature vector 𝜙𝜙(𝐱𝐱) and the hyperplane 𝑤𝑤𝑖𝑖. Therefore, we want

to query the example 𝐱𝐱� that induces a hyperplane as close to 𝑤𝑤𝑖𝑖 as possible [128]:

𝐱𝐱� = argmin|𝜙𝜙(𝐱𝐱).𝑤𝑤| = argmin|𝑡𝑡(𝐱𝐱)| 2-46

This strategy is called the simple margin which queries the sample closest to the

current decision boundary (see Figure 2-14). The circle represents the largest radius

hypersphere that can fit in the version space. The white area in the version space which

is bounded by solid lines corresponds to labelled training data while five dotted lines

(instances “a”, “b”, “c”, “d” and “e”) represent unlabelled data from the pool. Now,

according to simple margin theory, the instance “b” is closest to the SVM 𝑤𝑤𝑖𝑖 and so

we choose to query “b”.

Figure 2-14. Uncertainty-based active learning that queries “b”

Another theory-motivated query selection strategy is the query by committee, in

which a committee is formed where each committee member is allowed to vote on the

queries. This strategy develops a method to evaluate the disagreement among the

committee members and thus chooses the most informative sample [131]. To estimate

the disagreement, two approaches are widely used: vote entropy (see Eq. 2-47) and

Kullback-Leibler (KL) divergence (Eqs. 2-48 and 2-49) [121]:

𝑥𝑥𝑉𝑉𝑉𝑉∗ = argmax −�

𝑉𝑉(𝑦𝑦𝑖𝑖)𝐶𝐶

𝑙𝑙𝑙𝑙𝑙𝑙𝑉𝑉(𝑦𝑦𝑖𝑖)𝐶𝐶

𝑖𝑖

2-47


where 𝑦𝑦𝑖𝑖 is the number of samples selected for labelling, 𝑉𝑉(𝑦𝑦𝑖𝑖) is the number of votes

the sample receives from the committee members and 𝐶𝐶 is the committee size.

𝑥𝑥𝐾𝐾𝐾𝐾∗ = argmax

1𝐶𝐶�𝐷𝐷�𝑃𝑃𝜃𝜃(𝑐𝑐) ⃦𝑃𝑃𝐶𝐶�𝐶𝐶

𝑐𝑐=1

2-48

𝐷𝐷�𝑃𝑃𝜃𝜃(𝑐𝑐) ⃦𝑃𝑃𝐶𝐶� = �𝑃𝑃𝜃𝜃(𝑐𝑐)(𝑦𝑦𝑖𝑖|𝑥𝑥)𝑙𝑙𝑙𝑙𝑙𝑙

𝑃𝑃𝜃𝜃(𝑐𝑐)(𝑦𝑦𝑖𝑖|𝑥𝑥)𝑃𝑃𝐶𝐶(𝑦𝑦𝑖𝑖|𝑥𝑥)

𝑖𝑖

2-49

Expected error reduction, which is also called the decision-theoretic approach,

estimates the future error by training ℒ ∪ ⟨𝑥𝑥, 𝑦𝑦⟩ on the remaining unlabelled data 𝒰𝒰

and querying the minimum expected error by the following Eq. 2-50:

𝑥𝑥0/1∗ = argmin�𝑃𝑃𝜃𝜃(𝑦𝑦𝑖𝑖|𝑥𝑥)�� 1 − 𝑃𝑃𝜃𝜃+⟨𝑥𝑥,𝑦𝑦⟩�𝑦𝑦�|𝑥𝑥(𝑢𝑢)�

𝑈𝑈

𝑢𝑢=1

�𝑖𝑖

2-50

where, 𝑃𝑃𝜃𝜃+⟨𝑥𝑥,𝑦𝑦⟩ is the new retrained model containing ℒ ∪ ⟨𝑥𝑥,𝑦𝑦⟩ samples.

Active learning first garnered serious research attention in the 1980’s [132] and

since then it has remained a vibrant research area. Moon and associates [133] proposed

a new AL algorithm based on a cost driven decision framework where the learner

chooses to query either the labels or the missing attributes from the unlabelled data

points. Considering the common and domain specific features, Li and associates in

[134] proposed a novel multi-domain active learning framework which queried

information duplicated across domains, then further converting such information to

form part of the model loss reduction. Their method reduced the human labelling

efforts by 33.2%, 42.9% and 68.7% on sentiment classification, newsgroup

classification and email spam filtering respectively. Novak and colleagues [135]

presented a comparison between two most common query selection strategies: simple

margin and error reduction sampling, evaluating their performances on a range of

categories from the Reuters Corpus. Simple margin sampling performed better on a

large news article than error reduction. Moreover, the active learning method proved

to be more efficient by requiring only half of the samples required by to passive

learning for the same outcome.


In theory, AL is possible with any classifier that is capable of passive learning.

However, over the years, SVM has proven to be particularly effective, especially in

text classification [128, 136] and can easily identify uncertain samples as those that

are closest to the hyperplane. Goudjil and associates in [137] proposed a novel active

learning method based on the posterior probability estimation within SVM classifiers.

Applying the method on three well-known real world datasets: R8, 20ng and WebKB,

their experiments demonstrated that the method significantly reduced the labelling

efforts while simultaneously increasing classification accuracy. Using SVM-based

AL, Silva and Ribeiro [118] compared the effectiveness of AL with a baseline

classifier considering the deficit of labelled data.

Using naïve Bayes (NB) and k-nearest neighbour (k-NN) algorithms along with

SVM in AL, Hu and associates in [117] developed a novel exploration of the

reusability problem in text categorisation. For their experiment, they found that SVM

performed the best for text categorisation in an active learning setting. Since SVM has

been proven to be effective to select the samples closest to the hyperplane [128], the

SVM algorithm has been used in this research as a baseline classifier for active

learning (see Chapter 6).

2.14.2 Semi-Supervised Self Training

Another commonly used SSL algorithm is self-training. In self-training, a

classifier is first trained with the small number of labelled samples, and then it is used

to classify the unlabelled ones. The most confident unlabelled samples, together with

their predicted labels, are added into the training data, and the procedure is then

repeated [138]. In semi-supervised self-training (SSST), the learner automatically

labels samples from unlabelled data and adds the most certain samples to the training

data in each learning cycle. Pavlinek and Podgorelec [139] investigated the prediction

accuracy of self-training LDA (ST-LDA) performed on multinomial NB and SVM

algorithms. The method was tested on imbalanced datasets and it was discovered that

ST-LDA with multinomial NB outperformed other methods. For tweet sentiment

classification, Silva proposed a semi-supervised learning framework using a similarity

matrix constructed from unlabelled data. The experimental results of comparing this

method with self-training, co-training and SVM implied that the similarity based

approach performed better than the others in most of the assessed datasets (Twitter).


However, SSST learnings do not often provide better classifier accuracy

individually. For example, when the initial training data are very weak, many class

labels might be wrongly predicted and this could introduce incorrect data to the oracle

for manual labelling. Moreover, the samples with the highest confidence are not

necessarily the most useful ones and do not ensure higher predictive accuracy.

Conversely, if we could integrate other SSL methods with AL, it could further reduce

manual labelling without compromising the desired accuracy of the text classifier.

In this regard, Leng and associates in [119] proposed an active semi-supervised

SVM algorithm by using both active learning (to select class boundary samples) and

semi-supervised learning (to select class central samples). Experimental results

showed that their method finds the boundary samples more precisely than using AL

only. In a similar approach, Zhang and colleagues [116] combined co-training with

active learning. Their approach not only selected the most reliable instances according

to the criteria of high confidence and nearest neighbour, but also exploited the most

informative instances with human annotations to improve text classifier performance.

When using small training datasets, the combined approach of utilizing instances

related to most certain and active learning together showed better classification

performance [118]. Hajmohammadi and colleagues [123] developed a new model

combing AL with SSST for cross-lingual sentiment classification. The experimental

results showed that the model outperformed the baseline methods of active learning,

self-training, random sampling and support vector machine on three boom review

datasets.

2.15 SUMMARY AND RESEARCH GAP

This section summarises the limitations of typically collected maintenance data

and explains the lack of literature on information requirement specification for a

maintenance optimisation model. The literature review on optimisation models and

data mining methods suggests that there is a lack of discussion on identifying the

information requirement specifications for reliability and maintenance optimisation

models and methods to extract information from multiple maintenance databases. The

details are summarised below:


• Lack of data and information requirement specifications for

reliability and maintenance optimisation models

There has been vast quantity of literature on modelling maintenance optimisation

based on reliability or fault diagnostic analysis. However what data is needed and how

the industry data can be extracted to meet the requirements of these models are rarely

discussed in the literature. Failure and scheduled maintenance times, covariates, cost

and down times are the information required for maintenance optimisation models.

These types of information can be buried in a number of different information systems

or databases in various raw forms in asset intensive industries.

Without requirement specification, this information cannot be used directly in

optimisation models. For example, failure time is essential data needed by reliability

prediction and maintenance decision models. In most cases, it is not readily available

from industry databases. Such information normally can be extracted from multiple

information systems such as work order (WO) and outage records and even from the

digital control system (DCS). However, an accurate definition of failure is needed as

a pre-requisite of this type of information extraction. This necessity implies that data

needs to be collected according to the information requirement specifications. In the

past, data were collected without such an objective in place. There is no systematic

research on this to date.

• Unavailability of required information in typically collected

maintenance databases

As mentioned in Section 2.7, the main reason for the unavailability of required

information for optimisation models might be that pointed out by Louit and associates

in [2] as datasets are usually collected to record maintenance activity rather than

reliability analysis. Furthermore, data collected during the maintenance process is

usually incomplete. These characteristics make the standard approaches challenging

for discovering useful information.

• Methods for extracting required information from multiple

databases of maintenance data are unavailable

Most of the methods and models (as mentioned in Sections 2.3, 2.4 and 2.5)

assume that information required for maintenance optimisation models is available or


can be generated through simulation methods. Few efforts have been made to extract

this information from real industry databases in both numerical and text format.

The following chapter will provide the information requirement specification for

reliability and maintenance optimization models. Available maintenance databases

and information requirement in existing models will be investigated and the proposed

requirement specification will provide a guideline to record correct data in existing

maintenance databases. The requirement of information extraction methodology will

be discussed which enables readers to identify the thesis contribution when

approaching these problems.

Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models 57

Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models

This chapter aims to investigate the existing maintenance databases to obtain the

information requirements specification necessary for reliability and maintenance

optimisation models. In this regard, the limitations of existing maintenance databases

will be critically discussed and the sufficiency of existing maintenance databases

(required for maintenance optimisation models) will be analysed. This investigation

serves two purposes: 1) to identify key challenges that are to be addressed in the

remainder of the thesis; and 2) to improve the industry data collection practices to meet

the requirements of the models.

3.1 ARE CURRENT MAINTENANCE DATABASES SUFFICIENT FOR MAINTENANCE OPTIMISATION?

Basic asset information are typically available in maintenance databases, for

example, asset identification number and maintenance work identification number.

However, the primary information that is required for failure time models is the

effective age of the asset and historical failure and preventive maintenance times. More

importantly, one needs to distinguish the work orders into reactive and preventive

maintenance. Manzini et al. [13] discusses this issue and advises that a failure report

and work order should be collected for proper recording of corrective and planned

maintenance. The fundamental pieces of information to be collected are the date and

time of failure, the machine and component that failed, and the characteristics of the

maintenance action performed (time to repair, spare parts if used, and workload

employed). The failure report and work order are aimed to trace the maintenance

history (whether CM or PM) separately, so that this can be linked with maintenance

decision.

However, many industrial datasets to not conform to this standard and only

record work orders. These work order data are often recorded primarily for

maintenance workflow and planning purposes, not for reliability analysis [2, 15, 54]

58 Chapter 3: Information Requirement Specifications for Reliability and Maintenance Optimisation Models

which often leads to the existing maintenance database only being used as a “work

order system” without the power of analysis and reporting [14]. Thus, typically

available maintenance data only records the maintenance work done, the work

descriptions and the work priority, but does not specify if the necessary work is causing

downtime. This ambiguity as to whether the work order is a “failure” or preventive is

thus a key issue in the analysis of typically available maintenance data.

Another possible data source are the outage/downtime data. However, like work

orders, this data is incomplete from a reliability analysis viewpoint. Outage/downtime

data contain asset stoppage time, i.e. “when”, but the information as to why it was

down is often specified only at a very high level and is not sufficiently detailed for

reliability analysis (e.g. the system that causes the issue may be specified, but not the

component). Thus, downtime data is incomplete from an analysis point of view, where

one needs to know if this downtime is unplanned (i.e. a possible indication of a failure

event) or planned (i.e. due to preventive maintenance).

The free text descriptions of the work orders and downtime data can often

provide insight into the motivation for the maintenance action, however these

descriptions can be ambiguous, difficult to interpret (e.g. sometimes the descriptions

just state the component of the system that was maintained), or laborious to analyse.

So, depending on the working definition of functional failure of the asset and the

recording practices of maintenance personnel, it may difficult to ascertain if work

order is due to reactive or preventive maintenance. Perhaps more importantly, it is very

laborious and time intensive to manually examine tons of thousands of work order free

text descriptions.

Like failure time models, degradation models require failure times and

additionally require condition indicators (as discussed in Section 2.4). Thus, condition

data is needed and must be properly aligned with work orders in time (i.e. both must

have reliable time stamps). An important difficulty is of course collecting and storing

of massive amount of high frequency condition data [17]. However, it is still necessary

to obtain failure times, even if they are simply used to infer the failure degradation

level from condition data.

Maintenance costs are required are also required for both TBM and CBM. As

discussed in Section 2.5, maintenance cost consists of direct cost, downtime cost and


product reject cost. While exact downtime and reject costs can be difficult to establish

for operationally complex manufacturing operations, they can often be approximated

from expert knowledge and/or historical production data. On the other hand, direct

maintenance costs are quite easy to acquire, since the can mostly be obtained from

work order data or accounting information.

So, it is evident from the above discussion that the basic asset information might

be available in maintenance databases while a further analysis is required to identify

failure and planned maintenance times, which are required for both failure time models

(for TBM) and degradation models (for CBM). This suggests that a reliable and

general methodology for identifying failure and preventive maintenance times from

maintenance databases is a key enabler to identifying accurate models and subsequent

optimisation of maintenance.

3.1.1 Identifying Failure and Planned Maintenance Times

The common difficulty for both failure time and degradation models is to

identify the failure and planned maintenance times since, it is ambiguous as to whether

a work order is reactive or preventive. In order to identify failure times, one needs to

know when the asset is installed and when a reactive maintenance event occurs. Basic

asset information i.e. asset identification number and functional location of the asset,

asset installation date/time, and fault characteristics is typically available in

maintenance databases. However, in many industries, asset installation date/time is

difficult to obtain. In this circumstance, first historical failure time can be used as an

alternative of the installation time (albeit with the drawback of introducing bias in the

parameter estimation). However, other alternative ways to identify the installation time

of the asset thorough maintenance data fields are:

• Free Text Descriptions: Such text descriptions are mostly available to

maintenance databases, including work orders, stop data and plan to

work (PTW). The free texts contain the keywords (for instance, major

overhaul, commissioning, installation etc.) which may include the

installation of assets.

• Maintenance Type: Maintenance work type labels such as “Preventive”/

“Overhaul” can be an alternative source of such information. In this

regard, one may consider overhauls as an effective installation, and a


work order may be available to indicate the time/date (depending on the

specific model).

Another important characteristic is to link between the reactive maintenance

event and failure modes. If multiple failure modes exist, they need to be identified

clearly and need to model them separately. This is particularly useful to develop failure

time or degradation models for specific failure mode/s of an asset [5, 44]. Failure

modes can be obtained by examining the work orders.

Now, the reactive maintenance times can be identified from few potential

sources. Ideally, a field similar to “functional loss” would be directly recorded and

would indicate if the asset was able to function with the mentioned defect. If such data

field exists in maintenance databases, one can easily identify the reactive maintenance

time. This loss can be classified into three categories: complete, partial and potential

loss of function [1, 140]. In actual industrial settings, two categories only tend to be

used: complete and delayed loss of function [31, 34]. However, since work order are

often used only for management purposes, this “functional loss” field is not available

in many existing maintenance databases and indicators on asset function are often

buried in the free texts of maintenance work orders / logs.

Other potential sources to identify failure and planned maintenance times from

maintenance databases are:

• Work Order Type: Data recorded in this field is normally created due

to fault indication, fault inspection, maintenance work to repair defect as

well as preventive maintenance. Depending on the type of work order,

the maintenance events can be classified as failure (related to repair

defects) and planned maintenance (related to overhaul or preventive

maintenance).

• Maintenance Type: This data field stores the type of maintenance action

applied to the asset which could range from inspection, modification and

repair of defect to preventive maintenance [44]. Maintenance types

falling within repair of defect and preventive maintenance can be

classified as failure and planned maintenance events respectively.

• Maintenance Priority: This data field is recorded to tag the urgency and

level of emergency of the maintenance work. The urgency levels are


usually divided into Level 1, immediate repair of the asset, Level 2,

urgent repair of the asset and finally Level 3, planned and scheduled

repair of the asset. Depending on the urgency, maintenance events can

be distinguished into failure (i.e., Levels 1 and 2) and planned

maintenance events (i.e., Level 3).

• Maintenance Work Descriptions (Free Text Format): Such short text

data fields may contain reliable failure and planned maintenance

information of the asset [5]. Generally, this data filed is common across

multiple maintenance databases. In work orders, such data field contains

the intended repair work needs to be performed on the asset. The detailed

work descriptions in another database (often referred to as outage, stop

time or downtime data) describe the maintenance work when the asset is

offline. Work descriptions recorded in repair work may contain some

significant keywords, for example, “leak”, “block”, “stopped” or

“overhaul”, “planned stoppage” etc. Using such distinctive keywords,

maintenance events can be classified into two: corrective maintenance

due to failure or preventive maintenance. Moreover, work descriptions

from downtime data can be used to determine the CM with stops.

• On/Off Season Production: This data field is especially applicable in

the industries with season based production (i.e., sugar processing). More

often, planned maintenance activities are performed in the off season

when the assets are idle or non-productive. However, maintenance

activities during on season production do not imply failure events only.

During this time, maintenance jobs consist of failure events as well as

regular inspections and non-maintenance activities. One needs to

separate and identify failure events from other maintenance events.

• Maintenance Cost: This refers to the total maintenance cost required to

perform the maintenance work associated with the work order issued.

Depending on the requirements of resources, materials and spare parts,

the total maintenance cost can be classified into low, medium and high.

In general, maintenance activities related to routine inspections or minor

repairs are relatively low in cost due to minimal resource consumption.

However, the maintenance cost starts to increase in events which repair


failures. In many industrial practices, major overhaul and/or/preventive

maintenance events cost relatively far more than failure events. Based on

such classifications, failure and preventive maintenance events can be

identified.

• Maintenance Work Duration: Like maintenance cost, failure and

planned maintenance events can also be distinguished using maintenance

work durations. Major overhaul and/or preventive maintenance events

take a longer time to complete. Such job includes, complete inspections,

tests, measurements, adjustments, and replacements of the asset. Thus,

preventive maintenance tasks require longer time to complete compare

to corrective maintenance time. On the other hand, corrective

maintenance applies to repair unexpected failure and usually requires

less time to repair. Using maintenance time duration, failure and

preventive maintenance events can be determined.

However, the limitations of the existing maintenance databases can be avoided

by recording correct maintenance data, so that industry data can be directly used in the

existing models. Since, no extensive research has been conducted on identifying

standard data recording technique; this study initially proposes such recommendations

on failure time models. Table 3-1 shows the required data fields and related

maintenance databases necessary for failure time model.

Firstly, the installation date of the asset needs to be recorded and a related data

field is necessary (as proposed in Table 3-1). Using this initial commissioning date,

the complete maintenance history (including both failure and preventive maintenance

times) can be determined. To correctly identify the historical failure times, the most

crucial data field (e.g. complete functional loss) is required to be documented. This

data field can be filled out at the end of maintenance work as mentioned in Section

3.1. At the end of every maintenance work, the detailed repair tasks and technical

findings are documented in the work order. At this point, the condition of the asset and

the functional loss can be determined.


Table 3-1. Suggested data recording to support reliability modelling (i.e. failure time modelling)

Type of

Information

Required

Definition of the Required

Information

Required Data Fields Asset and

Maintenance

Database/s

Basic Asset

Data

Installation Date/Time of the

asset

Installation/Commissioning

Date

Work order

Failure times Asset stoppage due to

unplanned/urgent

maintenance

Work Order Type, Maintenance

Type, Maintenance Priority,

Functional Loss (complete),

Creation and Completion Dates

Work order,

Downtime Data

Scheduled/

Preventive

maintenance

times

Asset stoppage due to

planned/scheduled

maintenance

Work Order Type, Maintenance

Type, Maintenance Priority,

Creation and Completion Dates

Work order,

Downtime Data

Maintenance

Effect

Distinctive effects between

minimal, perfect and

imperfect repair

Maintenance Type,

Maintenance Priority,

Maintenance Effect (Text

Description)

Work order,

Downtime Data

Data Fields in Bold: proposed to be included in the existing databases

It is comparatively easier to determine the planned maintenance times using the

existing data fields (e.g. work order type, maintenance type and work priority). Finally,

a new data field “maintenance effect” is proposed to be included in this study. Along

with maintenance type and priority, this data field can be effective to distinguish

maintenance effects between minimal, perfect and imperfect using standardised text

description. Maintenance effect can also be documented by observing the effectiveness

of the repair work on component or system level. The maintenance personnel need to

differentiate between minimal repair and perfect repair and document their effects on

the asset (i.e. component or system level) accordingly. However, it is comparatively

difficult to record the effect of imperfect repair (which normally varies between zero

to one). In that case, industries can apply their own tagging standards and document

the maintenance effects.

3.2 REQUIREMENT FOR INFORMATION EXTRACTION METHODOLOGY

Although Sections 3.3 discusses that there are a huge number of data fields

available in maintenance databases, such data fields are often incomplete or sometimes


unreliable. Required information is often buried in different maintenance databases.

Moreover, the vast majority of such data require analysis and processing, so that

reliability and maintenance optimisation models can be implemented in real world

industrial settings. Given the mismatch between what is required and what is available,

however, this research develops a method to analyse the existing data fields available

in multiple maintenance databases and to link those databases to identify historical

failure and non-failure planned maintenance times. Since no detailed investigations

have been conducted on information requirements for reliability and maintenance

optimisation models, the current study starts to investigate the method of identifying

failure and non-failure maintenance time identification (See Figure 3-1). Failure times,

planned maintenance times in figure.

Figure 3-1. Overview of information extraction methodology


This thesis presents an investigation of the information requirements for the

reliability and maintenance optimisation models and data available in existing

maintenance databases, as described in this chapter. Using related maintenance

information, a methodology is formulated to identify failure and non-failure

maintenance times. In this regard, both supervised and semi-supervised machine

learning methods have been utilised and the methodology is tested on multiple real

world case studies. The detailed discussions on such methodologies and case studies

have been presented in the following chapters.

Chapter 4: Failure Time Extraction Methodology Using Text Mining 67

Chapter 4: Failure Time Extraction Methodology Using Text Mining

Information related to failure and preventive maintenance times is needed to

develop reliability and maintenance optimisation models. This, as a matter of course,

is a requirement specification for such modelling. However, due to changes in

accounting, maintenance practices, and information technology, organisations have

different databases that possess parts of the information needed to define a failure

event. Some common databases (e.g. “work orders”) contain the record of every

maintenance activity (i.e., repair, check or routine inspection) performed on an asset.

However, these databases rarely contain information on the motivation for the activity,

(i.e. was the issue raised to fix a “failure” or due to planned maintenance and did this

event cause any downtime?) In other words, it is often the case that individual

databases do not possess the necessary information to identify a “failure” event, where

one needs to know both when the asset is down and if this downtime was unplanned.

One of the possibilities to link such databases is to analyse the text descriptions (used

to describe the repair process). Such free texts are common in most databases and

contain the information requirement outlined in Chapter.

In this study, a text mining approach is proposed to extract accurate failure time,

using the free texts from multiple maintenance databases. This study automatically

labels fields in one maintenance database using existing data fields and links them with

those in another one through the free text descriptions. The proposed method thus

identifies downtime events whose descriptions are consistent with urgent and

unplanned maintenance.

This chapter begins with some further details of the motivation for linking

multiple maintenance databases using the text mining method. The challenges and

incompleteness in individual maintenance databases is mentioned here. A

methodology is then proposed which uses a text classification method to commonly

available real world databases to attribute each asset stoppage to either of the two

classes: failure or non-failure. The method is finally validated using a series of analysis

and experiments. As an intuitive means to validate the method, three different

validation indicators are applied as described in Section 4.3. The method is applied as

68 Chapter 4: Failure Time Extraction Methodology Using Text Mining

described in Section 4.4. The performance of the classifier constructed from the

proposed application can then be measured.

4.1 MOTIVATION

Real world maintenance databases are typically set up to focus on the work

process of maintenance activities (e.g. communicating what needs to be done by the

maintenance crew and when). In such settings, Database A contains descriptions of the

tasks to be performed and an estimation of how these are prioritised. Another database

(e.g. Database B) specifies the actual permit to work and safety precautions to be taken

for all maintenance activities. A further separate database, Database Z, contains

detailed information on when the asset is not operating, but lacks an unambiguous

indication of the reason for the stoppage. The challenge in uncovering failure time data

(FTD) is thus to identify failure events which correspond to forced downtime of the

asset. Neither database is complete from this perspective: Database A provides

information on what needs to be done and if the work was planned, while Database Z

contains reliable time stamps for when the work (may have) forced the downtime.

Thus, it is important that the databases are interpreted jointly, to identify FTD.

In maintenance databases, different types of data almost always contain free text,

and if the entries in respective databases describe the same event, it is reasonable to

expect that these descriptions will have similar contents. Thus, it is essential to develop

a method for linking Database A and Database Z using these descriptive texts.

4.2 METHODOLOGY

The main goal of the proposed method is to utilize these prescribed maintenance

databases to jointly determine when the asset has “failed”. According to the definition

of failure (i.e., any unplanned and urgent maintenance need which causes an outage)

each database is incomplete in itself in order to decide when the asset is down and if

the downtime is unplanned. If maintenance event in Database A is unplanned (e.g. the

result of a “defect”) and urgent (i.e. “high priority”), the related data in Database A is

considered to describe a potential failure event. Thus, the free texts will likely use

words that the organisation would use to describe a failure. Free texts from Database

A have been used to build a keyword dictionary and further used to construct text

classifiers. Text classifiers based on the free texts of unplanned and urgent

maintenance works are applied to Database Z to determine which maintenance event


is unplanned and/or urgent. Using both the databases helps to identify the actual failure

and non-failure maintenance time. The details of the each step will be discussed in the

following subsections.

Figure 4-1. Methodology to extract failure and non-failure maintenance times

The methodology is based on four main steps:

• Definition of failure

• Data filter and Database A labelling

• Features extraction and Construction of keyword dictionary

• Classifier construction and extraction of failure time

4.2.1 Definition of Failure

Failure of an asset can be defined in many ways depending on maintenance

notifications, failure types etc. In this research, failure is defined considering major

criteria (unplanned maintenance tasks) and outage time. Failure is defined considering

an extensive review of maintenance data, time and existing data types have been


discussed in Section 2.7. In this section, downtime analysis for different maintenance

types is presented.

To determine time related reliability parameters, e.g. MTBF, component life etc.,

the equipment surveillance period is typically used. For many units, the operating/in-

service period is less than the surveillance period due to maintenance, sparing of

equipment or intermittent operation of the unit. Although total downtime can be due

to planned and urgent maintenance as discussed in Section 3.1, Table 4-1 is mentioned

here to illustrate the real picture of phases of different maintenance types during total

downtime. Table 4-1 shows that outage/downtime can be triggered for both planned

and unplanned maintenance. Corrective or any unplanned maintenance is raised due

to failure, because planned maintenance is usually the planned downtime conducted to

prevent failure.

Table 4-1. Downtime classifications based on planned and unplanned maintenance Total Operating Time Downtime Uptime

Planned Downtime Unplanned Downtime Preventive Maintenance

Other Planned Outages

Corrective Maintenance

Other Unplanned Outages

Time of preparation and actual work being done

Modification, testing, checking

Time of preparation and actual work being done

Shutdown, operational problems

Running

It goes without saying that preventive/planned maintenance is usually conducted

before any failure happens while corrective maintenance is issued immediately after

failure occurs. So maintenance records with corrective maintenance denote a high

priority for repair, show urgency and clearly indicate the maintenance tasks to fix a

failure. It logically follows also that any maintenance event which contains unplanned

maintenance work can be defined as related to correcting a failure event. Therefore the

definition of failure in this research is:

“Any unplanned maintenance which causes downtime”

4.2.2 Database A Labelling

To expand on the distinction between two of the databases of concern in the

modelling related to this research, Database A tells us if the maintenance work is

triggered by a certain fault or by a predetermined routine plan. However it does not


indicate if this maintenance event constitutes a “failure”, i.e. if the fault stops the

operation of the asset. Therefore failure time information is not explicitly available in

Database A. On the other hand, Database Z contains asset stoppage information, but

the information as to why it was down is often unavailable.

Data fields in Database A such as work order type, maintenance type and work

priority have been considered to classify the maintenance event. Initially, any

unplanned maintenance event (identified by the “maintenance type” data field) has

been classified as a “failure” event. However, some unplanned maintenance events

may be issued not to repair a failure, but rather to check or inspect any anomaly or to

conduct routine maintenance actions. This situation suggests that the urgency of any

unplanned maintenance is similarly important. In addition to that, any unplanned

urgent maintenance that is raised and fixed within overhaul times (not during

unplanned downtime) needs to be classified as a non-failure (or scheduled

maintenance) event. Overall, using such data fields, Database A texts have been

classified into urgent and unplanned maintenance (possible failure) and planned

maintenance (possible non-failure) events (see Figure 4-2).

Thus in this thesis it is assumed that there are two types of maintenance events:

failure and non-failure. An important limitation exists in the case where opportunistic

maintenance is significant. For instance, if the boiler fan fails, the boiler must be shut

down for repair. One might take this opportunity to repair another part of the boiler

that is down as a result, e.g. the boiler pump. By our working definition of failure, this

is only a corrective maintenance for boiler fan, and a preventive (opportunistic)

maintenance for pump. Thus, opportunistic maintenance may result in a confusion of

preventive and corrective maintenance at the component level, depending on the work

order recording practices of the organisation. Thus, the presence of opportunistic

repairs necessitates clear description and/or tags to indicate which components drove

the downtime event (if any). Nevertheless, at the system level, the identification of this

event as a failure is still correct: the boiler has still failed due to a fan failure. However,

the resolution of the diagnosis of sub-system and component failures depends on the

recording and tagging practices of the organisation.

So, it is argued that Database A events classified as “urgent and unplanned

maintenance” and “planned maintenance” are the probable candidates for “failure” and


“non-failure maintenance” events. Database A, labelled with these two types of events,

will then be used to construct the text classifiers.

Figure 4-2. Data filter and Data Base “A” labelling

4.2.3 Features Extraction and Construction of Keyword Dictionary

The next step is to extract the features using the free texts used in Database A.

At this stage, it is necessary to transform the terms and sentences (i.e., used in free

texts) into a matrix form that machine learning algorithms can understand. This can be

done by splitting the text documents into individual words, which is called

tokenization. A token is the single element of a text string (keyword). The text

classifier requires data in the form of a matrix where each row contains a document

and each column presents a keyword. (Here keywords are the all tokens/terms within


the dictionary). Such a tokenized matrix can be constructed using different features

including BOW and N-Gram (see Section 2.10).

Table 4-2 lists the criteria used to extract features and different keyword

dictionaries (see Column 3). For instance, the keyword dictionary denoted as, 𝑡𝑡𝑡𝑡1 is

constructed based on all the keywords that appear at least once in the training data.

Based on the number of times a keyword appears in the training data, the other two

keyword dictionaries (i.e., 𝑡𝑡𝑡𝑡5 and 𝑡𝑡𝑡𝑡10 ) are created, as indicated in Table 4-2.

Similarly, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 measure the relative frequency of

the keyword in a specific document thorough an inverse proportion of that keyword

over the entire text documents. CS and IG based keyword dictionaries have been

selected randomly based on the total features appearing on training data.

Bi-Gram, Tri-Gram and their three possible combinations have been chosen for

N-Gram based features. Bi-Gram refers to N-Gram features of size 2, and considers

two consecutive keywords to predict the next one. Similarly Tri-Gram considers three

consecutive keywords for prediction and so on. To exploit a more efficient N-Gram

approach, the combinations of Uni-Gram, Bi-Gram and Tri-Gram features have been

extracted (see Table 4-2). Finally, the keyword dictionaries that contain the defined

features have been used for the construction of the text classifier and for the overall

classification purpose.

Table 4-2. Criteria used for feature extraction and construction of keyword dictionary Features Type Keyword

Dictionary Criteria used for Feature Extraction

BOW

TF 𝑡𝑡𝑡𝑡1 Each keyword appears at least once in training data 𝑡𝑡𝑡𝑡5 Each keyword appears at least five times in training data 𝑡𝑡𝑡𝑡10 Each keyword appears at least ten times in training data

TF-IDF (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 Relative weight of each keyword is based on the selection of tf1

(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 Relative weight of each keyword is based on the selection of tf5 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 Relative weight of each keyword is based on the selection of tf10

Chi-square 𝜒𝜒21608 Top 1608 keywords that contain class information of training data 𝜒𝜒21000 Top 1000 keywords that contain class information of training data 𝜒𝜒2500 Top 500 keywords that contain class information of training data

Information Gain

𝐼𝐼𝑁𝑁1608 Most significant & informative 1608 keywords in training data 𝐼𝐼𝑁𝑁1000 Most significant & informative 1000 keywords in training data 𝐼𝐼𝑁𝑁500 Most significant & informative 500 keywords in training data

N-Gram

Bi-Gram 𝑁𝑁𝑁𝑁2 Depends on immediate previous keyword Tri-Gram 𝑁𝑁𝑁𝑁3 Depends on immediate two consecutive previous keywords Uni-Bi-Gram

𝑁𝑁𝑁𝑁1,2 Combination of Uni-Gram & Bi-Gram

Bi-Tri-Gram 𝑁𝑁𝑁𝑁2,3 Combination of Bi-Gram & Tri-Gram Uni-Bi-Tri-

Gram 𝑁𝑁𝑁𝑁1,2,3 Combination of Uni-Gram, B-Gram & Tri-Gram


4.2.4 Classifier Construction and Failure Time Extraction

The text classifier is constructed using two separate algorithms: NB and SVM

(discussed in more detail in Section 2.11). Their performances (using different

measures and metrics mentioned in Section 2.12) are tested on testing data separately.

The detailed information required to construct the text classifier is given below (see

Figure 4-1),

• Labelled data from Database A (Class 1: urgent and unplanned

maintenance and Class 2: all other, including planned maintenance)

• Encoded text from Database A after text cleaning and noise reduction

• Keyword dictionary which is constructed using different features as

explained in Section 4.2.3 above

Given a set of training data from Database A labelled as unplanned or planned

maintenance (𝐷𝐷𝑓𝑓 and, 𝐷𝐷𝑝𝑝 respectively) the NB classifier can be trained as in Algorithm

4-1. To nullify the zero-frequency words in training data, the Laplace Estimator (the

“1” in Line 8) has been used.

Algorithm 4-1. Training Naive Bayes classifier 𝑇𝑇𝑟𝑟𝑎𝑎𝑖𝑖𝑛𝑛 𝑁𝑁𝐵𝐵(𝐷𝐷𝑓𝑓 ,𝐷𝐷𝑝𝑝); 𝐷𝐷𝑓𝑓 = Text field labelled as failure & 𝐷𝐷𝑝𝑝= Text field labelled as preventive

1 Extract keywords from 𝐷𝐷𝑓𝑓 → 𝑉𝑉𝑓𝑓 2 Extract keywords from 𝐷𝐷𝑝𝑝 → 𝑉𝑉𝑝𝑝 3 𝒇𝒇𝒇𝒇𝒇𝒇 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑐𝑐 ∈ {𝑡𝑡,𝑝𝑝} 𝒅𝒅𝒇𝒇 4 𝑁𝑁𝑐𝑐 = |𝐷𝐷𝑐𝑐| No. of documents in class c 5 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟[𝑐𝑐] ← 𝑁𝑁𝑐𝑐/𝑁𝑁 6 𝒇𝒇𝒇𝒇𝒇𝒇 𝒆𝒆𝒆𝒆𝒆𝒆𝒆𝒆 𝑡𝑡 ∈ 𝑉𝑉𝑐𝑐 7 𝒅𝒅𝒇𝒇 𝑇𝑇𝑐𝑐𝑡𝑡 Count occurrences of word 𝑡𝑡 in 𝐷𝐷𝑐𝑐 8 𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏[𝑡𝑡][𝑐𝑐] ← 𝑇𝑇𝑐𝑐𝑐𝑐+1

∑ 𝑇𝑇𝑐𝑐𝑐𝑐′+1𝑐𝑐′

9 𝒇𝒇𝒆𝒆𝒓𝒓𝒓𝒓𝒇𝒇𝒓𝒓 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟, 𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏

Using the output of Algorithm 4-1, the text fields in Database Z have been

classified as failure and non-failure in the following manner. Suppose the free text

field in Database Z contains the words 𝑤𝑤1, 𝑤𝑤2, … ,𝑤𝑤𝐶𝐶. Using the NB classifier, the

class label c∗ can be predicted using [90],


𝑐𝑐∗ = arg max

𝑐𝑐={𝑓𝑓,𝑝𝑝} 𝑝𝑝𝑟𝑟𝑖𝑖𝑙𝑙𝑟𝑟[𝑐𝑐]�𝑐𝑐𝑙𝑙𝑛𝑛𝑑𝑑𝑝𝑝𝑟𝑟𝑙𝑙𝑏𝑏[𝑤𝑤𝑖𝑖][𝑐𝑐]

𝐶𝐶

𝑖𝑖=1

4-1

where class 𝑐𝑐∗ = 𝑝𝑝 indicates planned maintenance and 𝑐𝑐∗ = 𝑡𝑡 indicates

unplanned maintenance (failures).

Similarly, the SVM classifier is trained on the tokenized text data from Database

A to recognize descriptions of urgent and unplanned maintenance jobs (see Section

4.2.2). Subsequently, the trained SVM decision function in Eq. (2-40) can be used to

categorise the similarly tokenized 𝐲𝐲𝑖𝑖 (𝐲𝐲𝑖𝑖 are the text descriptions from Database Z of

the unlabelled data 𝒰𝒰) as planned (𝐶𝐶𝑝𝑝) and unplanned (𝐶𝐶𝑓𝑓) work:

𝐶𝐶𝑓𝑓 = �𝐲𝐲𝑖𝑖|𝐲𝐲𝑖𝑖 ∈ 𝒰𝒰 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡�𝐲𝐲𝑖𝑖� ≥ 0� 4-2

𝐶𝐶𝑝𝑝 = �𝐲𝐲𝑖𝑖|𝐲𝐲𝑖𝑖 ∈ 𝒰𝒰 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡�𝐲𝐲𝑖𝑖� < 0� 4-3

4.3 VALIDATION OF THE METHODOLOGY

The performance of the classifier constructed from Database A can be compared

with the actual data labels mentioned in Section 4.2.2. According to the methodology

outlined in Figure 4-1, the text classifier (e.g. trained on Database A) is applied to

Database Z to identify failure events. The performance of such a prediction can be

measured by comparing the predicted labels of Database Z with the actual ones.

Although the true labels of Database Z were not directly recorded, this study estimated

the actual true labels using the existing data fields and thus measured the performance

of the classifier. The detailed validation process is discussed in Sections 5.1.7 and 5.2.7

for the case studies.

In addition to that, a series of different independent (albeit more “intuitive”)

validation indicators are employed,

• Word Cloud: Failure and non-failure word clouds were constructed

from training data from Database A and compared. Keywords appearing

in both the classes (failure and non-failure) can be visually distinguished.


• Text Descriptions: Randomly selected text descriptions from failure and

non-failure data from Database Z were analysed to compare the

keywords that are associated to fix “failure” events and other keywords

related to scheduled maintenance or routine inspection tasks.

• Cumulative Number of Failures: To analyse the accuracy of the

estimated prediction, the cumulative number of failure data items (from

Database Z) was compared before and after applying a text-mining

approach. The predicted failure data was compared with two naïve failure

estimates, i.e. those of Database A and Database Z.

4.4 APPLICATION OF THE METHODOLOGY

The main idea of the proposed failure time extraction methodology was to train

a text classifier using labelled data from Database A and this was then applied to

another database, Database Z which was not labelled at all. The purpose of this text

classification was to classify maintenance events in Database Z into failure and non-

failure labels (see Figure 4-1). In order to intuitively validate such a method, a new

application of the method was proposed and this is shown in Figure 4-3. In this

proposed application of the method, the newly labelled data from Database Z were

used to construct a new classifier. This time, the classifier was re-trained on text

descriptions used in Database A (training data) and those in the newly labelled

Database Z together. The updated classifier was further applied to testing data in

Database A. In this case, the predicted labels of testing data in Database A can be

compared with the actual ones. Moreover, the performance of the new classifier can

be compared with the previous one constructed from training data in Database A. The

detailed application process is highlighted in Figure 4-3.


Figure 4-3. Application of the methodology

The following steps have been considered for the application of the method,

• Re-train a new classifier using the newly labelled data from Database Z

augmented with the initial training data from Database A

• Label testing data from Database A using the new classifier

• Compare the predicted labels of testing data from Database A with the

actual ones

Suppose a new set of training data 𝐳𝐳𝑘𝑘 ∈ ℝ𝑑𝑑 with class labels 𝑐𝑐𝑘𝑘 ∈ {−1,1} where

𝐳𝐳𝑘𝑘 are the work descriptions extracted from the training data from Database A and

from the text data from Database Z combined together. The class labels of Database Z

have been taken from 𝐶𝐶𝑓𝑓 and 𝐶𝐶𝑝𝑝 (see Eqs. 4-2 and 4-3). Thus, the new classifier can

be expressed as:

�̂�𝑐 = �+1 𝑡𝑡(𝐳𝐳𝑘𝑘) ≥ 0 −1 𝑡𝑡(𝐳𝐳𝑘𝑘) < 0 4-4


The decision function in Eq. (4-4) was applied to 𝐭𝐭𝑡𝑡 (𝐭𝐭𝑡𝑡 are the test data from

Database A from data 𝒯𝒯) as planned (𝐶𝐶𝑝𝑝𝑡𝑡) and unplanned (𝐶𝐶𝑓𝑓𝑡𝑡) maintenance work:

𝐶𝐶𝑓𝑓𝑡𝑡 = {𝐭𝐭𝑡𝑡|𝐭𝐭𝑡𝑡 ∈ 𝒯𝒯 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡(𝐭𝐭𝑡𝑡) ≥ 0} 4-5

𝐶𝐶𝑝𝑝𝑡𝑡 = {𝐭𝐭𝑡𝑡|𝐭𝐭𝑡𝑡 ∈ 𝒯𝒯 𝑎𝑎𝑛𝑛𝑑𝑑 𝑡𝑡(𝐭𝐭𝑡𝑡) ≥ 0} 4-6

𝐶𝐶𝑓𝑓𝑡𝑡 and 𝐶𝐶𝑝𝑝𝑡𝑡 can be compared with their true labels as mentioned in Section 4.2.2.

The classifier mentioned in Eq. (4-4) was constructed using the text descriptions of

training data from Database A and the newly labelled data from Database Z (i.e.,

labelled by the classifier mentioned in Eqs. 4-2 and 4-3). Thus the performance of the

new classifier can be compared with the classifier trained on the training data from

Database A.

4.5 SUMMARY

This chapter presents a methodology for the extraction of failure and non-failure

maintenance times using commonly available maintenance databases. A text mining

approach is engaged to determine the keywords indicative of the source of unplanned

and urgent maintenance (discussed in Section 4.2.2). This study analyse the text

descriptions of one database to construct the keyword dictionary which is in turn used

to classify each maintenance stoppage into failure or non-failure maintenance time.

Most common text features (Section 4.2.3) and classification algorithms (Section

4.2.4) have been used to construct the keyword dictionary and thus to formulate the

text classifier. A validation method (mentioned in Section 4.3) is proposed to estimate

the actual labels of Database Z which will be used to compare the predicted labels

(identified by the proposed text classifier). In addition to that, a series of independent

validation methods and a new application of the methodology have been conducted.

Chapter 5: Case Studies on Failure Time Extraction 79

Chapter 5: Case Studies on Failure Time Extraction

In this chapter the applicability of the methodology proposed in Chapter 4 is

demonstrated on maintenance databases from an Australian electricity company and

an organisation which has overall responsibility for processing in the Australian sugar

industry. Analysis of the identified failure time appears to confirm the accurate

estimation of failure events in Database Z. The results are expected to be immediately

useful in improving the estimation of the failure time (and thus the reliability models)

for real world assets. The following outlines the main steps for the case studies:

There are six phases: (1) Summarise the basic information and working process

of the two industrial systems. (2) Conduct data description and text cleaning. This

introduces the common databases used in asset and maintenance records. General text

cleaning techniques include removing numbers, punctuation, extra spaces, stop words,

and the non-discriminating words which have been used. (3) Perform data labelling

and feature extraction. This involves the method of labelling data from Database A. In

doing this, different features are considered to develop a tokenized form of vectorized

matrix consisting of keywords and the construction of keyword dictionaries based on

the features. (4) Conduct text classification and performance evaluation. This focuses

on the construction of SVM and NB based text classifiers using different keyword

dictionaries. (5) Extract failure time. Here text classifiers are applied to categorise data

from Database Z into failure or non-failure. (6) Validate the extracted results. In this

final step, comparisons are made between the performance of the text classifier with

the estimated actual values, a new application of the method (see Section 4.4), word

clouds, manually observed text descriptions and the cumulative number of failures

before and after text mining.

5.1 CASE STUDY 1: COAL FIRED POWER GENERATION COMPANY

5.1.1 Overview of a Coal Mill

Power generation industry studies have shown that coal pulverisers are an area

where extensive research to improve equipment reliability is essential. The Electric

80 Chapter 5: Case Studies on Failure Time Extraction

Research Institute (ERI) has determined that 1% of plant availability is lost on average

due to pulveriser related problems. The ERI also identified oil contamination and

excessive leakage as two areas where pulveriser drive train failures account for 53%

of pulveriser problems. Coal mills are generally one of three types: low, medium and

high speed mills [141]. Low and medium speed mills are the most prevalent. Some

examples of medium speed mills include vertical spindle bowl, vertical roller and ring

and ball mills. The physical structure of a typical bowl mill (used in this case study) is

presented in Figure 5-1.

Figure 5-1. Overview of medium-speed (vertical spindle bowl) mill [141]

Pulverization is currently the favoured method of preparing coal for burning.

Mechanically pulverizing coal into a fine powder enables it to be burned like a gas,

thus allowing more efficient combustion. Transported by an air or an air and gas

mixture, pulverized coal can be introduced directly into the boiler for combustion.

5.1.2 Data Description and Text Cleaning

Maintenance data coming from the coal pulverized mills of an Australian power

plant over a 21 year period are used here to illustrate the application of the proposed

information extraction methodology. There are two distinct databases: work


orders/notifications (WO’s) and downtime data (DD). Using some selected data fields

in the WO database (i.e., maintenance type and work priority), one may determine

whether the work is issued to fix a “defect”, whether it is planned maintenance or

whether there is a level of urgency to perform the maintenance work. However, it does

not tell us that if the work is to fix a “failure”, i.e., if the maintenance work requires

stopping the operation of the asset. On the other hand, the DD database contains asset

stoppage information without stating whether the downtime is forced (i.e., unplanned)

or planned. Such incompleteness in the WO and DD is in line with our assumptions

presented in Section 4.2.2. The WO and DD for 12 mills were used. Figure 5-2 shows

the process of recording the WO and DD during the maintenance process.

Figure 5-2. Recording of two databases (WO and DD) during maintenance process (coal mill)

Table 5-1 shows five randomly selected data from each of the two databases.

Table 5-1 (a) shows that WOs contain information of the maintenance event (e.g.

maintenance type, work priority). However, on examining the WO text descriptions

from this table, one can see that the maintenance work descriptions are not

straightforward to interpret and there are few “obvious” descriptions of failure events


to the non-expert. Moreover, in the DD entries seen in Table 5-1 (b), “failure” is even

harder to recognize by inspecting work descriptions, particularly without any tags

denoting the urgency or type of the event. Thus, while both databases contain relevant

information for failure events, they should be interpreted together to ascertain failure

times of the asset.

Table 5-1. Five randomly selected data from (a) WO and (b) DD during maintenance process (data is slightly edited to protect proprietary information)

(a)Work Order Maintenance Type

Work Priority

Work Description

Defect Urgent XF PF mill PF leak repair on top of mill Preventive Maintenance Scheduled YF Major overhaul of mill

Defect Immediate Xf mill pyrites sluiceway is blocking wi Modification Planned Mill starting interlock bypass Defect Immediate XD top pyrt. gate not closing

(b) Downtime Data Work Descriptions Mech XD Mill, Fdr. Mech. Maint. From hot air gate Mech YE Mill Pyrites Doors. Hydraulic isolation only Mech.- YD Mill. Repair PF Leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill and Seal Air Fan 415V AC, Air, Supplies Isolated Elect. & mech. isolation of mill & feeder

First, the text descriptions of WO’s and DD from coal mills were cleaned

according to well-known techniques (see Section 2.10). One of the important parts in

text cleaning is to exclude non-discriminating words. In this case study, keywords such

as “mill”, and other location information text were excluded (see Table 5-2) as

although they are quite common in the free text they contain no information associated

with failure/non-failure (i.e. are non-discriminating). Table 5-2 also shows that

punctuation, numbers, and white space are removed from the data.

Table 5-2. Comparing a few WO data before and after cleaning process Documents Before Cleaning Documents After Cleaning PF LEAK ON XF MILL. Pf leak PF leak on XF mill pf leak PF Leak on 1st joint above riffle box pf leak st joint riffle box XMill spilling badly spilling badly Mill Windbox Scraper Upgrade Trial windbox scraper upgrade trial mill windbox has a hole where inspection windbox hole inspection


The cleaned texts can be viewed as a word cloud, an example of which for the

original work order data is shown in Figure 5-3 (the size of the word indicates its

relative frequency). Only keywords that appear more than 100 times in the WO

database have been displayed in the word cloud. In Figure 5-3, the most frequently

occurring word is “air” while “seal”, “fan” and “leak” also occur quite commonly. One

can see both the words that are likely to indicate failure (e.g. “repair”, “leak”, or

“block”) and those that indicate planned tasks (e.g. “inspect” and “overhaul”).

Interestingly, words that obviously indicate failure appear to be fairly infrequent in the

WO text.

Figure 5-3. Word cloud representing the keywords appearing in WO

5.1.3 Work Order Labelling and Feature Extraction

Using the labelling technique mentioned in Section 4.2.2, WOs (total 9436 data)

were classified as unplanned and urgent maintenance or planned maintenance (5053

and 4383 respectively). 80% of WO data was stored as training data while the

remainder was kept as testing data. The keyword dictionary was then formulated by

using the different features mentioned in Section 4.2.3. Performance of text

classification largely depends on such selected features. For instance, keyword

dictionaries were constructed from the features 𝑡𝑡𝑡𝑡1 and 𝑁𝑁𝑁𝑁12 and parts of such


dictionaries are shown in Table 5-3 and Table 5-4 respectively. Keyword dictionaries

constructed from other features are presented in Appendix A.

Table 5-3. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 1 Record No. Keywords [26-30] "airborne" "airdust" "airflow" "airoil" "alarm" [31-35] "alm" "amount" "adjacent" "analog" "analysis" [36-40] "annubar" "apart" "appear" "adjust" "applied" [41-45] "appo" "approx" "april" "aprox" "araldite" [46-50] "adrift" "areas" "arm" "armdamp" "around"

Table 5-4. A portion of Mixed-Gram keyword dictionary (𝑁𝑁𝑁𝑁12) for Case Study 1 Record No. Keywords [1000-1007] "coal outag" "coal pf" "coal probe" "coal pyrites" "coal sampl" "coal spilling"

"coil" [1008-1014] "coil blown" "coil burnt" "cold" "cold air" "collar" "collar nrv"

"collected" [1015-1021] "collected mil" "colour" "colour pl" "com" "come" "come adrift"

"come adriftr" [1022-1028] "come away" "coming" "coming abo" "coming adrift" "coming apart" "coming

away" "coming baffles"

5.1.4 Training and Testing Text Classifiers

In this research, using the WO database, performances of NB and SVM based

classifiers are compared. Initially both the classifiers are separately trained using

different keyword dictionaries (as mentioned in Section 4.2.3) and their performances

are tested by comparing the predicted values of failure and non-failure work orders

with actual ones not utilized in the training data. Table 5-5 and Table 5-6 show the

performances (i.e. accuracy, precision, recall and F-Measure) of two models, support

vector machine (SVM) and Naïve Bayes (NB) on the testing data using different

keyword dictionaries constructed from various feature selection methods. It can be

seen from Table 5-5 that the SVM based classifier outperforms the NB model for all

the features. N-Gram based keyword dictionaries are superior to all other feature based

dictionary types. TF based SVM performs comparatively better than TF-IDF. Among

TF methods, 𝑡𝑡𝑡𝑡5 is superior to the other two (i.e., 𝑡𝑡𝑡𝑡1, 𝑡𝑡𝑡𝑡10) methods which implies that

the classifier containing the keywords that appear at least five times in the training

documents shows the best performance. Moreover, performances of CS and IG based

SVM are comparatively similar to each other. It should be noted that the IG based

SVM performance is superior if the keyword dictionary contains all the keywords that

appear in the training data.


Table 5-5. Performances between SVM and NB classifiers using different keyword dictionaries

Support Vector Machine Naïve Bayes Keyword

Dictionary Accuracy Precision Recall Accuracy Precision Recall

𝑡𝑡𝑡𝑡1 96.29 96.23 96.81 92.32 93.59 91.82 𝑡𝑡𝑡𝑡5 96.82 96.45 97.60 92.79 92.37 94.21 𝑡𝑡𝑡𝑡10 96.77 96.45 97.50 92.32 91.00 94.91

(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 96.13 96.13 96.61 92.32 93.59 91.82 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 96.71 96.26 97.60 92.79 92.37 94.21 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 96.71 96.35 97.50 92.32 91.00 94.91

𝜒𝜒21608 96.66 95.69 98.23 92.32 93.59 91.82 𝜒𝜒21000 96.50 95.50 98.13 92.26 93.67 91.62 𝜒𝜒2500 94.60 92.32 98.13 92.58 92.01 94.21 𝐼𝐼𝑁𝑁1608 96.71 95.69 98.33 92.32 93.59 91.82 𝐼𝐼𝑁𝑁1000 96.50 95.50 98.13 92.21 93.67 91.52 𝐼𝐼𝑁𝑁500 94.59 92.32 98.13 92.47 92.07 93.91 𝑁𝑁𝑁𝑁2 96.93 95.45 99.02 93.03 91.05 96.41 𝑁𝑁𝑁𝑁3 91.10 92.34 90.05 87.33 81.24 99.00 𝑁𝑁𝑁𝑁1,2 97.18 95.99 98.82 93.48 91.90 96.21 𝑁𝑁𝑁𝑁2,3 95.71 93.74 98.62 93.22 89.80 98.40 𝑁𝑁𝑁𝑁1,2,3 96.45 95.97 98.72 93.48 91.19 97.11

Table 5-6. Comparison of Accuracy and F-Measures between SVM and NB using Different Keyword Dictionaries for Case Study 1 (Coal Mills)


Dictionary Accuracy F-Measure Accuracy F-Measure

𝑡𝑡𝑡𝑡1 96.29 96.52 92.32 92.70 𝑡𝑡𝑡𝑡5 96.82 97.02 92.79 93.28 𝑡𝑡𝑡𝑡10 96.77 96.97 92.32 92.91

(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 96.13 96.37 92.32 92.70 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 96.71 96.93 92.79 93.28 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 96.71 96.92 92.32 92.91

𝜒𝜒21608 96.66 96.94 92.32 92.70 𝜒𝜒21000 96.50 96.80 92.26 92.63 𝜒𝜒2500 94.60 95.14 92.58 93.10 𝐼𝐼𝑁𝑁1608 96.71 96.99 92.32 92.70 𝐼𝐼𝑁𝑁1000 96.50 96.80 92.21 92.58 𝐼𝐼𝑁𝑁500 94.59 95.14 92.47 92.98 𝑁𝑁𝑁𝑁2 96.93 97.20 93.03 93.65 𝑁𝑁𝑁𝑁3 91.10 91.18 87.33 89.25 𝑁𝑁𝑁𝑁1,2 97.18 97.38 93.48 94.01 𝑁𝑁𝑁𝑁2,3 95.71 96.12 93.22 93.90 𝑁𝑁𝑁𝑁1,2,3 96.45 97.33 93.48 94.06

As shown in Table 5-5 and Table 5-6, the accuracy and F-Measure obtained

using Bi-Gram is better than Uni-Gram and Tri-Gram. In other words the SVM

classifier using two consecutive keywords (i.e., Bi-Gram) performs better compared

to considering a single keyword or three consecutive keywords. In the case of the NB

classifier, the accuracy obtained using Bi-Gram is better than that when using Uni-

Gram or Tri-Gram. Nevertheless, NB and SVM classifiers constructed from the

Mixed-Gram method outperform other feature selection methods. That is to say, a

combination of Uni-Gram and Bi-Gram shows better accuracy compared to the other


two combinations. This implies that combinations of two and three consecutive words

or more make the data noisy and sparse and inversely affect the performance.

In summary, it was shown that the SVM performs the best using a keyword

dictionary constructed from the N-Gram feature. Interestingly, the SVM with a

combination of Uni-Gram and Bi-Gram shows the best accuracy for Case Study 1 (see

Table 5-5). The confusion matrices for both SVM and NB text classifiers using all the

text features are summarised in Appendix B.

5.1.5 Comparison between Failure and Non-Failure Work Orders

After training the classifier on the WOs, the resulting classifier may be used to

assign to each DD the category of either failure or non-failure. One can examine the

word clouds as a simple intuitive validation of the results. Figure 5-4 (a) and (b), which

are the failure and non-failure word clouds, respectively, illustrate this for the coal mill

data. While “air” tends to appear in both, one can clearly see that the words that

intuitively indicate failure are more prevalent in the failure word cloud (e.g. “repair”

and “leak”). On the other hand, Figure 5-4 (b) also shows some words that clearly

indicate non-failure (e.g. “change” and “inspect”).

(a)


(b)

Figure 5-4. Word clouds for (a) failure and (b) non-failure WO’s for coal mills

5.1.6 Failure Time Extraction

Since the Mixed-Gram (i.e., combination of Uni-Gram and Bi-Gram) based

SVM classifier shows a higher performance over the NB classifier (see Section 5.1.4),

failure times were identified using this classifier. In this regard, the Mixed-Gram SVM

classifier was applied to the DD to label each text description as either failure or non-

failure. Table 5-7 presents the outcome of the predictions. There are a total of six plants

(referred to as A, B, C, D, E and F) in each of the two units, X and Y. The predicted

result indicates that the proposed text mining method classifies a large number of DD

items as “failures”. Around 90% of the DD are identified to be “failures” which

suggests that most of the maintenance notifications in the DD have been issued to

repair “failure” events.

Table 5-7. Predicted instances of failure and non-failure downtimes (Coal mills) Unit X Unit Y Total

Instances Mill Mill A B C D E F A B C D E F

Failure 100 133 132 95 110 97 105 98 144 129 146 100 1389 Non-failure 13 8 11 15 39 9 8 11 10 15 11 15 165

Total Instances 113 141 143 110 149 106 113 109 154 144 157 115 1554

Of course, these predicted values cannot be objectively validated since true

failures cannot be independently verified in the historical data. For this reason, an

alternative method (mentioned in Section 4.3) to validate the predicted failure times

was devised and this is discussed further in the next section.


5.1.7 Validation of the Text Classifier

According to the proposed method (mentioned in Fig. 4.1), the WO trained text

classifier is finally applied to the DD to classify them into failure or non-failure. To

validate the accuracy of the text classifier (which is constructed from WO’s), one may

compare the predicted labels of the DD with the estimated “actual” ones using the

existing data fields. First, the actual labels of the DD were identified by using the data

fields recorded. Although the true labels are not directly recorded in the DD, the data

fields (i.e., cause code, cause description and work description) can be used for such a

purpose. Due to the lack of additional information (other than maintenance work

descriptions) in Case Study 1, failure and non-failure labels in the DD were estimated

by manual examination of maintenance work descriptions. By manual examination, a

total of 1360 records of the DD were classified as “failure” events while the rest of the

194 records were labelled as “non-failure” events.

Estimated actual labels of the DD were then compared with the ones predicted

by the text classifier and the cross tabulation result is shown in Table 5-8. The accuracy

(88.87%), precision (88.92%) and recall (99.71%) values (using Eqs. 2-42, 2-43, and

2-44) indicated that the WO trained text classifier performs well in the DD as well.

Thus the text classifier developed in the proposed methodology can efficiently identify

the accurate “failure” events.

Table 5-8. Cross tabulation of the DD comparing predicted labels with the estimated ones

Actual Failure Non-Failure

Prediction Failure 1356 169 Non-

Failure 4 25

5.1.8 Application of the Methodology

As discussed in Section 4.4, one way to perform an intuitive validation of the

results is to train a (new) classifier using the predicted DD labels and apply it to the

WOs. A new classifier was constructed using the DD (with the predicted DD labels

shown in Table 5-7) augmented with the initial training WO data. A combined N-Gram

(i.e., composed of both Uni-Gram and Bi-Gram) based SVM classifier was formulated

in this case and subsequently applied to testing WO data to label them as failure or


non-failure. A total of 9103 training data items (7549 from the training WO data and

1554 from the DD) were used to construct the N-Gram based keyword dictionary (a

total of 5997 keywords which are a combination of Uni-Gram and Bi-Gram). The new

classifier classified the testing WO: failures (1010 records) and non-failures (877

records). This methodology compares the predicted and actual labels of the WO data

tested in this way (see Table 5-9). It can be seen that the accuracy of the new classifier

(97.24%) is better than that of the previous classifier (97.18%). Due to the inclusion

of the newly labelled DD, the new classifier can be used to identify failure events using

the upcoming DD more accurately.

Table 5-9. Cross tabulation of testing work orders comparing predicted labels to

the actual ones



Failure 5 872

5.1.9 Comparison between Failure and Non-Failure DD using Text Descriptions

In place of independent validation of the accuracy of the results, one can examine

the text descriptions of the classified DD (shown in Table 5-10 for five randomly

selected data from each of the two classes). The text descriptions of the non-failure

column in Table 5-10 contain some keywords that one would intuitively expect (e.g.

“overhaul”, “change”, “oil”, “cleaning”, etc.) that reflect repair work for non-failure

maintenance. Similarly, text descriptions from the failure column in Table 5-10

contain some keywords (e.g. “leak”) that clearly indicate maintenance works to fix a

failure. Other descriptions are more ambiguous (e.g. “Mech.- XA Mill Pyrites

Doors.”). However, identifying these descriptions as unplanned is not necessarily

incorrect. If the WOs contain a large number of high priority and unplanned activities

on the “pyrites doors”, then the presence of that text certainly lends evidence to the

DD being a failure. In other words, the lack of obvious failure words may provide

further support for the argument to use the text mining method in this research,

particularly if the failures need to be identified by non-experts.


Table 5-10. Randomly selected predicted downtime data of Unit X Mill A (coal mill) Types of Maintenance Records

Failure Non-Failure Mech.- XA Mill. Repair PF Leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill.

Major Overhaul - from Feeder Outlet Spade to PF Outlet Flaps on top of Mill AND to Hot & Cold Air Gates. 3.3kV MSD, 240V ac , 220V ac , 110v ac , & 110V dc supplies; Seal Air; RCW; Lube Oil; Hydraulic Oil .

Mech.- XA Mill Pyrites Doors. Hydraulic isolation only

Change over and Isolate Valve

ELECT. - 415 V A.C. . MECH. - Hot Air Gate Shut

Supplies Isolated, Oil Mechanical Maintenance to 40 MICRON GEARBOX LUBE OIL FILTER

Mech.- XA Mill. Repair dust leak. Primary and Seal Air between Hot and Cold Air Gates. Electrical isolation of Mill.

Cleaning & meggering of motor

Hyd Isol v/v's - Shut Door Relays - Pulled.

CMOP Overhaul of Rolls & Journals,Scrapers,Body internals,Picollo Tube,Thermocouples, Pyrities V/V's and Hyd's,Seal Air Fan,Discharge v/v's,Lube Oil Press switch,Mill DP instrumentation, Replace Riffle Elements, Lube Oil p/p,Seal Air Fan v/v's & PF pipes

5.1.10 Cumulative Number of Failures before and after Text Mining

For another perspective on the failure time data, one may examine the

cumulative number of failures before and after the text mining approach has been

applied, as seen in Figure 5-5. As a comparison, the cumulative number of raw WO

and DD events are plotted (in green and blue, respectively), which may be (naively)

assumed to be failure times if all events are considered unplanned. In Figure 5-5, the

text-mined failure times are plotted in brown and clearly indicate that the number of

cumulative failures is less than the raw number of DD events. However, it can be seen

that analysis of the raw DD data would likely provide a reasonable estimate of the

failure intensity (e.g., average number of failures per unit time), since the text-mined

failure intensity and the raw DD failure intensity are quite similar.


Figure 5-5. Cumulative number of failures for Unit X, Mill A (coal mill)

5.2 CASE STUDY 2: BOILERS IN SUGAR PROCESSING INDUSTRY

5.2.1 Overview of a Boiler System

The second case study used to demonstrate the proposed methodology was based

on the sugar processing industry which not only manufactures sugar but also produces

electricity and methanol. Five nations (including Australia) account for 40% of the

world’s total sugar production [142]. The boiler is the essential component in sugar

mills and failure of such a critical component causes huge production loss. This is a

closed vessel in which water or fluid is heated under pressure. The steam or hot fluid

is then circulated out of the boiler in various processes. The main input fuel used in

the boiler is bagasse, which is a by-product of the sugar extraction process (see Figure

5-6).


Figure 5-6. Functional layout of sugar processing system (adapted from [142])

5.2.2 Data Description and Text Cleaning

Maintenance data for a series of boilers were collected over a 26 year period.

Similarly to Case Study 1, there are two distinct databases, the WOs and the DD, which

once again are incomplete in the manner discussed in the introduction. Table 5-11

shows a few examples of WOs and DD from the data and illustrates that information

from both the databases is essential to interpret the failure events.

Table 5-11. Five randomly selected examples from (a) WO and (b) DD (data is slightly edited to protect proprietary information)

(a)Work Order Maintenance Type

Work Priority

Work Description

Corrective Maintenance Urgent Broken conduit at Ash system

Preventive Maintenance Scheduled Annual Washdown - Boiler 2

Corrective Maintenance Immediate Repair worn areas of bagasse chute

Preventive Maintenance Planned Minor Overhaul Ash System Pumps

Corrective Maintenance Immediate Reweld broken deflector Bagcon. 2

(b) Downtime Data Work Descriptions No 3 bagasse belt tripped - fire hose on trip cable Low steam Removing bolt from no.3 bagasse belt Esj tank full - low steam pressure Elect. & mech. isolation of mill & feeder


Text cleaning techniques were applied to remove sparseness, punctuation and

unwanted features from the data as detailed in the methodology and demonstrated in

Case Study 1. In this case, keywords such as “mill”, “bagasse” and other asset

identification texts which are common but non-discriminating were removed. Figure

5-7 displays the word cloud of WO data showing most frequent keywords. In this word

cloud, some keywords clearly indicate maintenance: “replace”, “repair”, etc.; some

suggest possible failure terms: “leak” etc. and some indicate planned or routine

inspections: “overhaul”, “clean”, “inspect”, “test” etc.

Figure 5-7. Word cloud representing the keywords appearing in WO

5.2.3 Work Order Labelling and Feature Extraction

The data labelling techniques for unplanned and urgent maintenance work

(mentioned in Section 4.2.2) were applied to the WO database (total of 2,350 records)

and thus they were classified into urgent (713 records) and planned maintenance

(1,637 records). The keyword dictionary was then formulated by using different

features mentioned in Section 4.2.3. Table 5-12 and Table 5-13 show the keyword

dictionaries constructed from the features 𝑡𝑡𝑡𝑡1 and 𝜒𝜒2500 respectively. The rest of the

keyword dictionaries constructed from other features are presented in Appendix C.


Table 5-12. A portion of keyword dictionary (𝑡𝑡𝑡𝑡1) for Case Study 2 Record No. Keywords [1000-1009] "second" "secondary" "sect" "section" "send" "sense" "sensing" "sensor"

"sensors" [1010-1018] "sensorsin" "sequ" "servic" "service" "services" "set" "sets" "setup"

"sewing" [1019-1027] "sfty" "sfy" "shaft" "shafts" "sharp" "shear" "sheet" "shield"

"shift" [1028-1031] "shower" "shrouds" "shtr" "shute"

Table 5-13. A portion of keyword dictionary (𝜒𝜒2500) for Case Study 2 Record No. Keywords [26-30] "rol" "blr" "access" "actu" "actuat" [31-35] "actuator" "actuators" "add" "added" "adjust" [36-40] "adjustable" "adjustcheck" "adjustment" "aerofoil" "afan" [41-45] "air" "airheat" "airheater" "airheatertubes" "alarm" [46-50] "align" "alignment" "allow" "alter" "ammet"

5.2.4 Training and Testing Text Classifiers

Like Case Study 1, performances of SVM and NB classifiers were measured on

testing data and these are shown in Table 5-14 and Table 5-15. Both the classifiers

were separately trained on keyword dictionaries (discussed in Section 5.2.3). Table

5-14 shows the model performances using different keyword dictionaries constructed

from various features. As illustrated in the table, the 𝜒𝜒2500 based SVM classifier is

superior to all other methods. Performances of both TF and TF-IDF based SVM

classifiers appear to be quite similar, and better overall compared to the NB classifiers.

However, the recall values for SVM are below satisfactory (see Column 4 in Table

5-14). Among the TF-IDF methods, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 is superior to the other three

approaches (i.e., (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5, (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 ), which implies that the

classifier containing the keywords that appear at least once in the training data through

an inverse proportion of each keyword over the entire training data shows best

performance.


Table 5-14. Performances between SVM and NB using different keyword dictionaries


Dictionary Accuracy Precision Recall Accuracy Precision Recall

𝑡𝑡𝑡𝑡1 71.45 67.74 12.50 70.18 51.33 45.83 𝑡𝑡𝑡𝑡2 71.82 64.44 17.26 69.09 49.32 42.86 𝑡𝑡𝑡𝑡5 71.45 60.00 19.64 69.45 50.00 32.14 𝑡𝑡𝑡𝑡10 70.36 56.10 13.69 69.27 49.37 23.21

(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 72.91 85.19 13.69 70.18 51.33 45.83 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2 71.82 68.57 14.29 69.09 51.33 45.83 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 71.82 66.67 15.48 69.45 50.00 32.14 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 70.91 61.76 12.50 69.27 49.37 23.21

𝜒𝜒21297 71.91 53.13 12.69 69.57 46.62 46.27 𝜒𝜒21000 72.13 53.33 17.91 70.21 47.54 43.28 𝜒𝜒2700 72.97 56.36 23.13 71.06 49.04 38.06 𝜒𝜒2500 74.04 60.00 26.87 71.70 50.56 33.58 𝜒𝜒2300 74.04 65.79 18.66 71.28 49.32 26.87 𝐼𝐼𝑁𝑁1297 71.91 53.13 12.69 69.57 46.62 46.27 𝐼𝐼𝑁𝑁1000 72.13 56.36 23.13 70.00 47.06 41.79 𝐼𝐼𝑁𝑁700 73.40 58.18 23.88 70.43 47.62 37.31 𝐼𝐼𝑁𝑁500 74.04 60.00 26.87 71.49 50.00 32.84 𝐼𝐼𝑁𝑁300 74.04 65.79 18.66 72.34 52.63 29.85 𝑁𝑁𝑁𝑁2 72.55 100 3.73 69.15 41.27 19.40 𝑁𝑁𝑁𝑁3 71.70 66.67 1.49 70.64 41.67 7.46 𝑁𝑁𝑁𝑁1,2 72.98 88.89 5.97 67.02 43.71 54.48 𝑁𝑁𝑁𝑁2,3 72.13 80.00 2.99 67.23 37.18 21.64 𝑁𝑁𝑁𝑁1,2,3 71.91 53.57 11.19 64.47 42.47 68.66

Table 5-15. Comparison of Accuracy and F-Measures between SVM and NB using Different Keyword Dictionaries for Case Study 2 (Boilers)

Support Vector Machine

Naïve Bayes

Keyword Dictionary Accuracy F-Measure Accuracy F-Measure

𝑡𝑡𝑡𝑡1 71.45 21.11 70.18 48.42 𝑡𝑡𝑡𝑡2 71.82 27.23 69.09 45.86 𝑡𝑡𝑡𝑡5 71.45 29.59 69.45 39.13 𝑡𝑡𝑡𝑡10 70.36 22.01 69.27 31.58

(𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1 72.91 23.59 70.18 48.42 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)2 71.82 23.65 69.09 48.42 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5 71.82 25.13 69.45 39.13 (𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)10 70.91 20.79 69.27 31.58

𝜒𝜒21297 71.91 20.49 69.57 46.44 𝜒𝜒21000 72.13 26.81 70.21 45.31 𝜒𝜒2700 72.97 32.80 71.06 42.86 𝜒𝜒2500 74.04 37.12 71.70 40.36 𝜒𝜒2300 74.04 29.07 71.28 34.79 𝐼𝐼𝑁𝑁1297 71.91 20.49 69.57 46.44 𝐼𝐼𝑁𝑁1000 72.13 32.80 70.00 44.27 𝐼𝐼𝑁𝑁700 73.40 33.86 70.43 41.84 𝐼𝐼𝑁𝑁500 74.04 37.12 71.49 39.64 𝐼𝐼𝑁𝑁300 74.04 29.07 72.34 38.09 𝑁𝑁𝑁𝑁2 72.55 7.19 69.15 26.39 𝑁𝑁𝑁𝑁3 71.70 2.91 70.64 12.65 𝑁𝑁𝑁𝑁1,2 72.98 11.19 67.02 48.50 𝑁𝑁𝑁𝑁2,3 72.13 5.76 67.23 27.36 𝑁𝑁𝑁𝑁1,2,3 71.91 18.51 64.47 52.48


Keyword dictionary (based on both CS Statistic and IG) performs best among

all other methods. For instance, instance, a keyword dictionary constructed from the

500 most informative features achieves the highest accuracy (i.e., 74.04%) for the

SVM classifier. In terms of measuring the completeness (i.e., recall value) of the

classifier result, the NB performs better compared to the SVM. Using the NB classifier,

the recall values of the Mixed-Gram approach (whose recall values are highlighted in

bold) are superior to those of the Bi-Gram or Tri-Gram methods.

In summary, it is evident from Table 5-14 and Table 5-15 that the SVM based

classifier outperforms NB model and performs best (i.e., in terms of accuracy and

precision) using a keyword dictionary constructed from 𝜒𝜒2500 features. However, the

recall values of the Table 5-14 and F-Measure (in Table 5-15) suggest that the NB

classifier using Mixed-Gram features is the best choice for the extraction. The

confusion matrices for NB and SVM based text classifiers are shown in Appendix D.

5.2.5 Comparison between Failure and Non-Failure Work Orders

As a part of visual validation of the prediction, one may examine Figure 5-8 (a)

and (b) for boiler WO word clouds. One can clearly note failure words in Figure 5-8

(a) (e.g. “choked” and “failed”). Work order descriptions containing such keywords

indicate asset malfunction due to failure. On the other hand, in Figure 5-8 (b), certain

words (i.e., “overhaul”, “inspect” and “checks”) indicate obvious planned maintenance

events or routine inspection tasks on boilers.

(a)


(b)

Figure 5-8. Word clouds for (a) failure and (b) non-failure WO’s for boilers

5.2.6 Failure Time Extraction

The 𝜒𝜒2500 based SVM classifier is applied to the DD to classify the data items

as failure or non-failure. Table 5-16 presents the outcome of the predictions. The table

clearly indicates that the text mining approach classifies a significant number of the

DD as non-failure, where the vast majority of the maintenance jobs appear to have text

descriptions that indicate non-failure tasks. Table 5-16 (see values of Columns 3 and

4 in bold) specifies that a large number of non-failure jobs were carried out in the

boiler body and bagasse system. Frequent planned maintenance work and continuous

monitoring on the boiler body and bagasse system is vital for safe functioning.

Table 5-16. Predicted instances of failure and non-failure downtimes (boilers) Functional Locations ASH BD BAG STM TA BF OT Total

Instances Failure 02 16 20 02 02 02 12 56

Non-failure 17 299 391 21 64 14 188 994 Total Instances 19 315 411 23 66 16 200 1050

ASH: Ash; BD: Boiler Body; BAG: Bagasse; STM: Steam; TA: Turbo Alternator; BF: Boiler Feed Water; OT: Other

5.2.7 Validation of the Text Classifier

As with Case Study 1, the predicted labels of the DD were compared with the

estimated “actual” ones using the existing data fields. The data fields (e.g. cause code,


cause descriptions and maintenance work descriptions) were used to estimate the

actual labels of the DD. First, cause codes and cause descriptions of the DD events

were utilized to classify them into failure and non-failure. Cause descriptions of the

DD which contain the keywords “leak”, “block”, “jam”, “fail”, “break” etc. were

chosen to denote failure while the keywords “adjustment”, “cleaning”, “incorrect-

setting” were chosen to classify non-failure events (see Table 5-17). However, some

cause codes are hard to classify in this way and contain non-discriminating

characteristics (for instance, “control linkages” or “circuit breaker”). To overcome

such difficulty, an additional data field (i.e., work descriptions) was chosen. Such text

descriptions were manually examined along with the cause codes to finally classify the

DD into one of the two labels: failure and non-failure. Following this approach, a total

of 1047 DD records were classified into failure (242) and non-failure (805) labels.

Table 5-17. Estimating actual DD labels using the existing data fields Data Fields in DD Actual

Labels of DD

Cause Code

Cause Description Work Description

27, 28, 16, 19, 26, 25, 15, 32, 35

Driven Machine Failure, leak, Choked / Blocked / Jammed, Underspeed / Pressure Tri, Drive / Transmission Fail, Structural Failure, Overload / Overfull, Motor Failure, Fuse / Circuit Breaker

Manual Examination

Failure

22, 20, 29, 99, 36, 30, 34, 70, 3, 23, 72

Adjustment / Cleaning, Safety Manual Trip, Incorrect Setting / Adjus, Cane Quality, Software Error, Control / Linkages, Electrical Power Supply, Low Steam - Poor Fuel, Scheduled Mid Week Stop, Derailment, High Juice Levels

Non-Failure

The estimated labels of the DD were then used to compare with those predicted

by the text classifier. In this case, the χ2500 based SVM classifier was used to predict

the DD labels. Table 5-18 shows the cross tabulation outcomes of the prediction.

Table 5-18. Cross tabulation of the DD comparing predicted labels with the estimated ones



Failure 131 767


The χ2500 based SVM classifier constructed from the WO’s shows better

accuracy (83.86%) and precision (74.5%) when it is applied to the DD. However, the

recall value (45.88%) indicates that the classifier is still a weak tool for identifying the

true positive rates. Overall, it is evident that the methodology can identify the failure

event more efficiently.

5.2.8 Application of the Methodology

A new classifier (as shown in Section 4.4) was constructed using the DD (with

the predicted DD labels shown in Table 5-16) augmented with the initial training WO

data. A χ2500 based SVM classifier was formulated in this case and subsequently

applied to testing the WO data to classify them as failures or non-failures. A total of

2930 training data (1880 training WO data and 1050 DD) were used to construct the

χ2500 based keyword dictionary. The new classifier classified the testing WO into

failures (60 data items) and non-failures (410 data items). Table 5-19 compares the

predicted and actual labels of the testing WO data. The cross tabulation result implies

that the accuracy of the new classifier (74.26%) is better than the previous classifier

(74.04%).

Table 5-19. Cross tabulation of testing work orders comparing predicted labels with the actual ones



Failure 99 311

5.2.9 Comparison between Failure and Non-Failure DD using Work Descriptions

In place of independent validation of the accuracy of the results, one can examine

the text descriptions of the classified DD (shown in Table 5-20 for five randomly

selected data samples from each of the two classes). The obvious failure keywords,

“stop”, “broken”, “jammed”, etc. are clearly evident in the failure column of Table

5-20. On the other hand, the non-failure column contains some keywords (e.g.,

“checking”) that indicate a planned or routine inspection task to continuously monitor

the asset condition. In addition, the non-failure column does not contain obvious

indicators of non-failure maintenance. Nevertheless, the non-failures in the DD clearly


indicate some events that are not a function of the asset condition (e.g. “poor fuel”),

which provides evidence that these events were correctly classified.

Table 5-20. Randomly selected predicted downtime data for boilers Types of Maintenance Records

Failure Non-Failure BAGASSE SYSTEM STOP - BROKEN RECLAIMER CHAIN. POWER BLACKOUT CAUSED BY LOW

STEAM PRESSURE NO. 2 ID FAN HAD A BROKEN SPEED RING. POOR FUEL - LOW BACK PRESSURE REPAIRS TO NO. 1 RECLAIMER CHAIN. LOW STEAM PRESSURE REPAIRS TO NO. 3 BAGASSE BELT - MAGNET CLEAN. LOW ON STEAM REMOVED BROKEN ROLLER FROM NO. 2 BAGASSE BELT.

CHECKING LOOSE BOLTS

5.2.10 Cumulative Number of Failures before and after Text Mining

Figure 5-9 displays the cumulative number of failures before and after the text

mining approach. For comparison, the raw WO and DD cumulative number of events

were plotted (in green and blue, respectively). These can be considered to be naïve

estimates of the failure times. The text-mined failure times (Figure 5-9) were plotted

in brown and clearly indicate that the number of cumulative failures is less than the

raw number of DD events. This story is quite different compared to Case Study 1, as

one can observe that the cumulative number of events for the WO database and the

DD are much higher than the text-mined failure time estimates. Clearly, both the DD

and WO events appear to overestimate the failure intensity.

Figure 5-9. Cumulative number of failures for boilers in sugar processing industry


5.3 SUMMARY AND DISCUSSION

As shown in Table 5-5, Table 5-6, Table 5-14, and Table 5-15 (considering

accuracy and F-Measures), it is clear that the SVM classifier performs better compared

to the NB one in maintenance data from coal mills and sugar boilers. Other than recall

metrics in the boiler case, the SVM outperforms the NB and shows better accuracy

and precision. In coal mill data, the Mixed-Gram based feature outperforms other

methods and indicates the best performance when it is used with the SVM text

classifier. On the other hand, the SVM gets the best accuracy and precision with the

Chi-square feature on sugar boiler data. In the case of the recall measure, the Mixed-

Gram based NB classifier gets the best value of all.

The validation method establishes that the text classifier performs well in DD to

determine the failure time information. On the coal mill data, the performance (88.87%

accuracy, 88.92% precision and 99.71% recall) implies that the classifier can

effectively identify the failure and non-failure maintenance events. On the boiler data,

the performance (83.86% accuracy, 74.5% precision and 45.88% recall) of the

classifier is still satisfactory.

It is evident from the table that the Mixed-Gram feature is effective to construct

a keyword dictionary and performs the best with the SVM classifier on coal mill data.

Such superior performance suggests that an order-of-words method applied with a

hyperplane based text classifier (i.e., SVM) performs best in data containing tens of

thousands of maintenance records. The SVM still performs well with a relatively small

amount of maintenance data (the sugar boiler data). In that case, the Chi-square or

Information Gain based feature gets better accuracy and precision compare to the N-

Gram method. This implies that features with the most informative keywords are more

suitable for the boiler case rather than the N-Gram method. This would likely hold

true in sets of maintenance data for other industries.

However, recall value in text classification is more critical due to the distribution

of positive and negative values. For boiler case, Table 5-14 and Table 5-15 indicate

that the SVM classifier has poor recall value (18.66%) as well as F-Measure (29.07%)

in spite of having larger accuracy (74.04%). Due to the large number of false negative

(FN) samples (see Table 5-18), the classifier shows such poor recall value. Thus, there

is a margin for improvement in recall, particularly in boiler. This thesis uses a semi-


supervised approach in the next chapter while the text classifier is developed using

minimum number of expert labelled maintenance data. This new information

extraction methodology is expected to improve the performance of the classifier,

especially the recall value for boiler case.

Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning 103

Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning

Although WO data contains data tags that can be used to label them as relating

to failure or non-failure, a number of challenges might be considered here. These

include, among others, uncertainty in the dates, noisy labels and irregular recording

[3]. One way to overcome this is to label WO data (and/or DD) using expert

interpretation on free texts. However, in large data systems, expert interpretation is

expensive and requires a significant amount of time.

In this regard, many prefer the active learning method and construct text

classifiers using the minimum data possible. Compared to the standard machine

learning methods which use only labelled training data, active learning employs

unlabelled data along with a minimum amount of labelled data to train a classifier.

Ideally, active learning could also be applied to either WO’s or the DD to identify

failure time information by using experts to interpret the free texts. However, if expert

judgement is to be used, experts may need information from both databases to form a

reliable opinion about the maintenance in question. This study has thus adopted a

method that mitigates the cost of constructing a text classifier from a limited number

of expert-labelled samples from WO’s. The constructed classifier is then applied to

attribute each DD to failure or non-failure event.

This chapter starts with the motivation for using an advanced text mining

method. Shortcomings of automatic labelling are mentioned with reference to the

research problem, followed by a discussion of reasons to use a novel method to

mitigate such obstacles. After that, an innovative method is proposed for testing the

feasibility of the active learning concept as applied to maintenance data. The proposed

method is finally demonstrated in maintenance data sets from Australian power

generation and sugar processing industries. The outcome and results have been

presented at the end the chapter.

104 Chapter 6: Advanced Information Extraction Methodology Using Text Mining and Active Learning

6.1 MOTIVATION

Text classification methods presented in Chapters 4 and 5 perform well in

analysing free texts and are able to determine failure events by classifying the

maintenance events into failure and non-failure classes. Such methods usually rely on

supervised machine learning algorithms which require that the training data are

labelled. Yet, the availability of labelled training data is problematic, particularly for

historical maintenance records over a considerable period where strict tagging

standards may have been put into place only recently (perhaps as a part of a new IT

system). Thus, it is often preferable to label maintenance data using expert assessment

of each entry, but such a time intensive process can be very laborious. Therefore,

training a classifier using the fewest manual labels as possible is proposed as a

necessary alternative.

One method to increase the efficiency of classifier training is through the use of

semi-supervised learning (SSL) methods which select training samples as a part of the

learning process [116-119]. Using SSL methods, a classifier can be constructed from

one maintenance database and then applied to a different (but related) database to

determine the failure events. As yet, no such SSL methods have been developed to

identify failure events in industrial maintenance databases.

In this chapter, an active learning-based text classification method is proposed

to identify failure time data (FTD) from multiple maintenance databases (WO and

DD), which represents an extension of the information extraction methodology

(proposed in Section 4.2) to incorporate the feedback of experts. The initial classifier

is constructed by manually labelling a small number of free texts of the maintenance

work descriptions from WO data using an SSL approach. New informative samples

are identified using the current classifier and these are added to the initial training data.

To identify failure events, the trained classifier is applied to the free text of DD to

associate each downtime event with a failure (or non-failure) time. The developed

method is tested on two real case studies in Australia: one in power generation and one

in the sugar industry.


6.2 METHODOLOGY

This research seeks to develop a method to identify FTD by linking typically

available maintenance data using text mining and expert feedback and opinion. In

particular, text mining is used to link two databases (WO and DD) containing different

components of data essential to identifying FTD, while expert feedback is used to train

the text classifier. Due to the expense of labelling, an efficient method is developed

for requesting expert opinion based on the text examples that will likely be most

informative to the classifier (i.e. active learning) (see Figure 6-1).

Figure 6-1. Active learning techniques used in the methodology

The overall methodology proposed in this research is illustrated in Figure 6-2.

Since WOs are more plentiful and it is typically easier for experts to establish if a WO

is a failure, the WOs will be used to train a text classifier, which will be applied to the

text field of the DD. First, the free text descriptions from both WOs and the DD are

pre-processed, which includes text cleaning processes. Subsequently, a base classifier

is trained on a small number of initial WO labelled data ℒ by an expert (Split 1) which

are then used to label the remaining WO unlabelled training data 𝒰𝒰 from the pool

(Split 2). From the newly classified unlabelled data, the most uncertain samples are

selected for expert labelling (i.e. an uncertainty sampling strategy is pursued). The

selected samples are then added to ℒ for the next learning cycle. In the next cycle, the

model is retrained on the augmented ℒ, and this process of uncertainty sampling and

re-training is repeated until a termination condition is satisfied. Finally, the improved

classifier is subsequently applied to the free text of the DD to label each stoppage as


either a failure or non-failure. The details of the each step will be discussed in the

following subsections.

Figure 6-2. Uncertainty-based Active Learning text classifier

6.2.1 Text Cleaning and Initial Training Data Formulation

The free texts used in maintenance databases contain a large proportion of non-

informative text that have been cleaned by removing unwanted spaces, numbers,

punctuation and non-discriminating words (i.e. stop words). A series of typical text

cleaning methods has been used (as mentioned in Section 2.10). The raw texts are then

represented by simple features known as a bag of words. This representation ignores

the order in which the terms appear, providing only a variable indicating whether the

term appears or not. This can be done by splitting the cleaned text data into individual

words, which is called tokenization. The text classifier generally requires text data in

the form of a matrix, where each row contains a maintenance record of text data and

each column presents a keyword.

The pre-processed WOs were divided into training and testing data. The testing

data are only used to evaluate the performance of the classifier at the end of the


experiment and are not be utilized for any other purpose. The training data are further

split into two groups: ℒ and 𝒰𝒰 (as shown in Figure 6-2). To construct the initial

classifier, a small percentage of training data is randomly chosen, labelled by the

expert, and placed into ℒ. After selecting the initial labelled data to train the initial

SVM classifier, then the remainders are taken as the unlabelled training data in 𝒰𝒰.

Suppose the initial labelled training data 𝒰𝒰 contains d-dimensional feature vectors

𝐱𝐱i ∈ ℝ𝑑𝑑 with class labels 𝑦𝑦𝑖𝑖 ∈ {−1 (non − failure), 1 (failure)}. Each 𝐱𝐱i represents

an item of text containing a maintenance work description. A binary classifier 𝑡𝑡(𝐱𝐱)

can be used to predict the label (failure or non-failure) of each description 𝐱𝐱i:

𝑦𝑦� = �+1 𝑡𝑡(𝐱𝐱i) ≥ 0−1 𝑡𝑡(𝐱𝐱i) < 0 6-1

In this work, support vector machines (SVMs) will be utilized to describe the

discriminant function 𝑡𝑡(𝐱𝐱i) (using Eq. 2-37) and the well-known RBF kernel

(following Eq. 2-41). The values of 𝐰𝐰 and 𝑏𝑏 are determined by maximizing the

classification margin (e.g. Eq. 2-38) in the high-dimensional feature space for a given

misclassification penalty 𝐶𝐶 and RBF width parameter 𝛾𝛾 . The solution to this

optimisation problem yields the resulting classifier formulated in Eq. 2-40. Here 𝛼𝛼𝑖𝑖 𝑖𝑖 =

1, 2, … .𝑁𝑁 are Lagrange multipliers from the optimisation problem and are only non-

zero for a relatively small subset of the data have 𝛼𝛼𝑖𝑖 ≠ 0 (i.e. “support vectors”) [93].

The parameters 𝐶𝐶 and 𝛾𝛾 are typically determined through cross-validation. The

learned decision function can then be used to predict the class label of a given text

description 𝐱𝐱i.

6.2.2 Active Learning via Uncertainty Sampling

As shown in Figure 6-2, by training on ℒ, the unlabelled training data 𝒰𝒰 are

labelled into two classes: failure and non-failure by the initial classifier:

𝑌𝑌𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝒰𝒰 and 𝑡𝑡(𝐱𝐱i) ≥ 0} 6-2

𝑌𝑌𝑛𝑛𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝒰𝒰 and 𝑡𝑡(𝐱𝐱i) < 0} 6-3


During each iteration of the AL process, a classifier is trained on the labelled

samples in ℒ and it queries the unlabelled training data in 𝒰𝒰 that is closest to the

current classification boundary:

𝑋𝑋𝑎𝑎𝑎𝑎 = arg min𝐱𝐱i∈𝒰𝒰

𝑡𝑡(𝐱𝐱i) 6-4

where 𝑡𝑡(𝐱𝐱i) represents the distance of 𝐱𝐱i to the classification boundary. Such

queried samples are the most uncertain ones (due to their proximity to the decision

boundary [119]) and are thus selected for expert labelling and added to ℒ, i.e. ℒ ← ℒ ∪

𝒳𝒳𝑎𝑎𝑎𝑎. The classifier is subsequently re-trained using ℒ and the process is repeated until

a termination condition is satisfied (e.g. number of iterations, minimum

accuracy/recall/precision).

The trained classifier is finally used to separate the 𝐷𝐷𝐷𝐷 into failure and non-

failure classes,

𝑍𝑍𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝑆𝑆𝑇𝑇 and 𝑡𝑡(𝐱𝐱i) ≥ 0} 6-5

𝑍𝑍𝑛𝑛𝑓𝑓 = {𝐱𝐱i|𝐱𝐱i ∈ 𝑆𝑆𝑇𝑇 and 𝑡𝑡(𝐱𝐱i) < 0} 6-6


The algorithm is summarized in Figure 6-3.

Figure 6-3. The Uncertainty-based AL algorithm

6.3 CASE STUDIES

The above methodology will be demonstrated on two case studies discussed in

Chapter 5. In both, there are two distinct databases: work order/notifications (WO) and

downtime data (DD). As mentioned earlier, WOs and DD are incomplete in the manner

that one requires both to identify failure time.

Table 6-1 shows the randomly selected data entries from WO for the two case

studies. The table shows that WOs contain information useful for the expert

interpretation of the maintenance event (e.g. maintenance type, work priority).

However, when examining the WO text descriptions, one can see that the maintenance

work descriptions are not straightforward to interpret and there are few “obvious”


descriptions of failure events to the non-expert. In most cases, work descriptions

mislead with respect to the maintenance type and priority to determine the “failure”

events. For example, there is a clear absence of unambiguous failure terms in the text

descriptions for both the 2nd and 9th entries in Table 6-1 (a), even though such WO tags

present with defect and high priority maintenance work.

Table 6-1. A few randomly selected data entries from WO for (a) coal mills and (b) boilers (a) Coal Mills

Entry No. Maintenance Type

Work Priority Work Description

1 Defect Immediate Had trips to manual 2 Defect Urgent bottom pyrities door local hydraulic plu

3 Preventive Maintenance Planned Weld overlay of mill table

4 Modification Planned XD Bunker Trial Nozzle 5 Defect Immediate Xf mill pyrites sluiceway is blocking wi

6 Preventive Maintenance Planned thermocouple pocket

7 Defect Immediate Missing switchboard label 8 Non-Maintenance Planned Cantilever PF Spades 9 Defect Urgent YD Mill hot air gate.

10 Preventive Maintenance Planned Mill Air Blasters

(b) Boilers

Entry No. Maintenance Type Work Priority Work Description

1 Corrective Maintenance Urgent lights on no3 boiler

2 Inspection Planned & Scheduled

Replace V-Belts on Bag.Conveyor Drives

3 Corrective Maintenance Immediate Collect oil samples from G-boxs

4 Corrective Maintenance Urgent Modifications to FD Fan inlet

5 Non-Maintenance Planned & Scheduled Bagasse Bin- Fab. 2 Sets Sprockets

6 Corrective Maintenance Urgent Boiler 3 ID fan - clean

7 Non-Maintenance Planned & Scheduled

Cyclone Repairs Replace 5 bagasse tarps

8 Corrective Maintenance

Planned & Scheduled

Replace no3 lift pump suction valve

9 Urgent Maintenance Immediate Clean around bagasse conveyor 3

10 Inspection Planned & Scheduled Remove feeder chains - Boiler 2


On the other hand, the maintenance type and work priority of the 7th entry might

suggest that it is a “failure”, but the description clearly indicates an issue that has little

to do with the operation of the asset (“missing switchboard label”). Thus, the

descriptions and tags need to be interpreted together by an expert to reliably identify

failure events in the work orders.

On the other hand, in DD entries as seen in Table 5-1 (b) for coal mills and

Table 5-11 (b) for boilers, “failure” is even harder to recognize by inspecting work

descriptions, particularly without any tags denoting the urgency or type of event. Thus,

both databases contain useful information for failure events and should be interpreted

together to ascertain failure times of the asset.

After that, the text descriptions of WOs data and the DD from both case studies

were cleaned according to well-known techniques. The detailed cleaning process and

outcomes have been discussed in Section 5.1.2 for coal mills and Section 5.2.2 for

boilers.

6.3.1 Classifier Formulation and Benchmark Algorithm

The cleaned WOs are divided into two: 80% training and 20% testing. An initial

classifier is constructed using 5% of expert labelled ℒ, chosen randomly from the

training WOs. This initial classifier is then constructed on labelled ℒ. The classifier is

eventually applied on unlabelled 𝒰𝒰 and the procedure is repeated as mentioned in

Section 6.2.2.

The performance of the proposed AL-based text classifier is compared with three

other models:

• The first model is the standard SVM classifier, which is treated as a

baseline classifier denoted as SVM100% expert labelled. For this model, WOs

have been considered as the training data to train an SVM classifier

where all the data samples (ℒ and 𝒰𝒰) are manually labelled.

• The second model is the hybrid model combining AL and SSL as

proposed by Leng and colleagues [119], which is denoted as AL-SSST.

While the standard machine learning methods use only labelled training

data, this model employs unlabelled data along with some labelled data

for training classifiers with improved accuracy [120]. This model not


only selects the most reliable samples but also exploits the most

informative ones with human annotations to improve text classifier

performance. Logically, such reliable samples are likely to be in the non-

queried unlabelled data (i.e., not queried by the classifier in the active

learning process):

𝒰𝒰𝑛𝑛𝑛𝑛 = 𝒰𝒰\{𝑋𝑋𝑎𝑎𝑎𝑎} 6-7

During the process of selecting reliable samples, the most certain samples have

been chosen from 𝒰𝒰𝑛𝑛𝑛𝑛 at the furthest distance from the classification boundary:

𝑋𝑋𝑠𝑠𝑡𝑡 = arg max𝐱𝐱i∈𝒰𝒰𝑛𝑛𝑛𝑛

𝑡𝑡(𝐱𝐱i) 6-8

Thus, AL-SSST queries both the samples 𝒳𝒳𝑎𝑎𝑎𝑎 and 𝒳𝒳𝑠𝑠𝑡𝑡 and adds to the ℒ, i.e.

ℒ ← ℒ ∪𝒳𝒳𝑎𝑎𝑎𝑎 ∪ 𝒳𝒳𝑠𝑠𝑡𝑡 in each training cycle, until a termination condition is satisfied.

• The last classifier, denoted as SSST, primarily trains on ℒ, queries the

samples 𝒳𝒳𝑠𝑠𝑡𝑡 furthest from the classification boundary, adds to the ℒ, i.e.

ℒ ← ℒ ∪ 𝒳𝒳𝑠𝑠𝑡𝑡 and repeats until a termination condition is met.

To comprehensively evaluate the performance of different models mentioned

above, the classification accuracy has been chosen as the evaluation criterion [3],

Accuracy =

the number of correctly classified datathe number of all testing data

6-9

6.3.2 Accuracy of the Text Classifier

In this analysis, Rstudio1 is used to analyse the free texts from WOs in Case

Study 1 and Case Study 2 as well to train and test the SVM-based classifiers. After

constructing different classifiers based on the models mentioned in Section 6.3.1, their

accuracies are tested by comparing predicted values of failure and non-failure WOs

with actual ones, details of which are stored in testing the data. Although active

learning and semi-supervised learning methods are existing approaches in the literature

1 RStudio: Integrated Development for RStudio, Inc., Boston, MA URL http://www.rstudio.com/.


and have been tested in different application domains, in industrial applications such

methods are relatively new to determine a failure time event from maintenance

databases.

In this study, the proposed model (i.e., the AL based text classifier) is constructed

using the training data from the WO maintenance databases in each case study.

Moreover, the accuracy of the proposed model is compared with three other common

models: SVM100% expert labelled, AL-SSST and SSST (used in text classification methods).

Figure 6-4 shows that the classification accuracies for the number of labelled data

increase when performing different SSL models on the training data from the two case

studies.

On coal mill data, AL outperforms AL-SSST and SSST and gets the same

accuracy with SVM100% expert labelled when the percentage of labelled data is equal or

above 35%. In principle, SSST based models query 𝒳𝒳st (i.e., what is selected by the

initial classifier itself) and use their labels accordingly. The result of poor accuracies

for the models constructed by SSST methods implies that the initial classifier itself is

not good enough to automatically categorise 𝒰𝒰 without using expert assessment.

(a) Coal mills in power generation


(b) Boilers in sugar processing industry

Figure 6-4. Classification accuracies of each model over the percentage of labelled data increase

On boiler data, AL is also superior to AL-SSST and SSST, achieving comparable

accuracy to SVM100% expert labelled when the percentage of labelled data is between 55%

and 80% and achieving a similar accuracy to SVM100% expert labelled when the percentage

of labelled data is above 80%. Moreover, AL-SSST manages to reach comparable

accuracy with AL when the percentage of labelled data is above 80%. This implies

that the classifier constructed from 𝒳𝒳𝑠𝑠𝑡𝑡 and their labels are also effective after the

majority of the data is labelled. The performance measures of active learning based

text classifiers for both the case studies have been shown in Appendix E.

Another analysis has been conducted to identify the maximum accuracy

achieved by each model and the corresponding total number of training samples

required to achieve this accuracy (see Table 6-2). The models are based on both AL

(data labelled by the expert) and SSST (data labelled by the classifier) methods.

Column 4 in Table 6-2 represents such percentages of labelled data.


Table 6-2. Classification accuracies of different models over two case studies

Case Study Models Total No. of Data

Data Labelled (%)

Maximum Accuracy Achieved

(%) By the

classifier By the expert

Power Generation

Company (Coal Mills)

𝐴𝐴𝐿𝐿 964 - 39.5 95.56 𝑆𝑆𝑉𝑉𝑆𝑆100% expert labelled 2439 - 100 95.21

𝐴𝐴𝐿𝐿 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 2375 67.61 32.39 88.56 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 1154 100 - 78.30

Sugar Processing Industry (Boilers)

𝐴𝐴𝐿𝐿 974 - 54.08 78.72 𝑆𝑆𝑉𝑉𝑆𝑆100% expert labelled 1801 - 100 80.43

𝐴𝐴𝐿𝐿 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 1658 65.56 34.44 77.02 𝑆𝑆𝑆𝑆𝑆𝑆𝑇𝑇 567 100 - 70.44

Numbers in bold: maximum accuracy achieved and corresponding numbers of data required to

achieve the maximum accuracy

On coal mill data, AL achieves the highest accuracy (95.56%) using only 39.5%

of labelled data, which is still higher than 95.21% if we used 100% labelled data using

the SVM100% expert labelled method. Unlike manually labelling all the data, one may use

AL-SSST to achieve a maximum accuracy of 88.56% by using the labelling ratio of

67.61%: 32.39% of classifier to expert, respectively. Table 6-2 also shows that a

maximum accuracy of 78.30% can be achieved by the classifier SSST by using 100%

of the labelled data by the classifier only.

In the case of boiler data, accuracies achieved by the three models (AL, AL-

SSST and SSST) are comparatively close to each other. However, the accuracy

obtained from the SVM100% expert labelled method (80.43%) is higher than the other three

models: AL (78.72%), AL-SSST (77.02%) and SSST (70.44%). Similar to Case Study

1, AL achieves a maximum accuracy of 78.72% by using around 54% of labelled data.

It is important to mention that the accuracy of AL converges toward and is equal to

SVM100% expert labelled where 100% data are labelled [119].

In sum, to test the feasibility of active learning using expert assessment, all the

WO data have been labelled manually and the feasibility of AL and other SSST

methods has been tested. Their accuracies were compared with the 100% expert

labelled data. In the actual scenario, it is impossible to manually label all the data. The

result implies that AL can achieve the maximum accuracy using only 40% of data from

coal mills and 54% of data from boilers being labelled by the expert. Although AL-

SSST requires less manually labelled data than AL, the saving is very limited

compared to the sacrifice in accuracy.


6.3.3 Validation of the Classifier

To validate the accuracy of the text classifier constructed from WOs, the

predicted labels of the DD have been compared with the estimated “actual” ones,

discussed in Section 5.1.7 for coal mills and in Section 5.2.7 for boilers.

Table 6-3 shows the accuracies of AL classifiers (using different percentages of

labelled WO data) which are applied on the DD. At the beginning, AL classifiers are

constructed over various percentages of labelled WO data (see Column 2 in Table 6-3).

Each of the classifiers is then applied to the DD to label them as failure and non-failure.

Predicted labels are finally compared with the estimated actual ones and their

accuracies have been shown in Column 3 of Table 6-3.

For coal mills, the accuracies of different classifiers are quite similar to each

other. When trialling different percentages of labelled WO data from 10% to 100%,

the accuracies range from 87.97% to 88.55%. However, the classifier (using 40%

labelled WO data) shows an accuracy (88.15%) very close to the accuracy (88.55%)

of the classifier using 100% labelled WO data. This implies that one may manually

label only 40% of WO data and achieve almost similar outcomes compared to

constructing the classifier using 100% labelled WOs.

Table 6-3. Accuracies of the AL-based classifiers (WO trained classifiers) on DD

% of

Labelled WO

Accuracy on DD (%)

Power Generation Company

(Coal Mills)

10 87.64 20 87.97 30 87.77 40 88.15 50 88.20

100 88.55


30 60.84 35 68.09 40 68.00 45 80.61 50 80.42 55 83.40

100 85.58

For the boiler case, the classifier constructed using 30% labelled WO data shows

poor accuracy (60.84%) compared to 85.58% accuracy at 100% labelled WO data.

However, the accuracy starts to increase significantly if 45% of the WO data is being


labelled. At 55% of labelled WO data, the accuracy (83.40%) is comparable and very

close to the maximum accuracy achieved at 100% labelled WO data. Like the previous

case, one may manually label only 55% of WO data and achieve almost similar

outcomes compared to constructing the classifier using 100% labelled WOs.

It is clear from Table 6-3 that one may require a minimum number of labelled

WO data items to construct the text classifier. The detailed performance measures for

both the case studies have been outlined in Appendix F. The proposed method requires

less manually labelled data with a limited sacrifice in accuracy.

6.3.4 Failure Time Identification Using DD

According to the outcomes from Section 6.3.4, the AL_40% (using 40% labelled

WO data) and AL_55% (using 55% labelled WO data) classifiers are applied to the

DD to label each of them as failure or non-failure for coal mills and boilers,

respectively. Table 6-4 and Table 6-5 present the outcomes of the predictions. The

tables clearly indicate that the AL based text mining approach categorises a significant

number of DD items as non-failure, particularly in the case of the boiler, where the

vast majority of the maintenance actions appear to have text descriptions that indicate

non-failure actions.

Table 6-4. Predicted instances of failure and non-failure downtimes (coal mills) Unit X Unit Y Total

Instances Mill Mill A B C D E F A B C D E F

Failure 103 130 135 97 141 100 111 99 140 134 146 107 1443 Non-failure 10 11 8 13 8 6 2 10 14 10 11 8 111

Total Instances 113 141 143 110 149 106 113 109 154 144 157 115 1554

Table 6-5. Predicted instances of failure and non-failure downtimes (boilers) Functional Locations ASH BD BAG STM TA BF OT Total Instances

Failure 04 28 176 05 02 06 04 225 Non-failure 15 380 240 23 66 11 87 822

Total Instances 19 408 416 28 68 17 91 1047 ASH: Ash; BD: Boiler Body; BAG: Bagasse; STM: Steam; TA: Turbo Alternator; BF: Boiler Feed Water; OT: Other


6.4 BENEFITS OF INCLUDING EXPERT LABELLING

This study uses the most informative samples from WO’s to construct the text

classifier. In each training cycle, the informative examples are being labelled by the

expert and added to the classifier. Eventually, the classifier is updated using such

iterative process. To evaluate the benefit of expert labelled classifier, this study

constructs a mixed classifier using both expert and automatic labelled WO’s.

Automatic labelling uses the existing data fields in WO’s without any expert

assessment. Based on our working definition of failure, if the WO is unplanned (e.g.

“defect”) and urgent (i.e. “high priority”) the work order is considered to describe a

potential failure event. Thus, the free texts will likely use the keywords that the

organisation would use to describe a failure. Using the urgency and the source of the

maintenance request, all the data samples (ℒ and 𝒰𝒰) are labelled automatically.

To compare the performance, the first classifier is constructed using automatic

labelling of all the data samples from WO’s where no expert labelled sample is used.

Eventually, the next classifier is formulated using 20% of expert labelled data while

the rest of them are automatically labelled. Following similar procedure, the

percentage of expert labelled data is increased by 20% with similar reduction in the

percentage of automatic labelled data. The last classifier is constructed using all the

expert labelled data where no automatic labelled sample is used.

Table 6-6. Performance of the mixed classifier over percentages of automatic and expert labelled (uncertainty-based) WO

Case Studies

Mixed Classifier Performance on Expert Labelled Test set (%)

% of WO using

automatic labelling

% of WO using expert labelling

(uncertainty-based)

Accuracy Precision Recall

Power Generation


100 0 94.72 94.57 95.00 80 20 94.89 94.95 95.00 60 40 95.18 96.28 94.10 40 60 95.39 96.39 94.02 20 80 95.41 96.73 94.10 0 100 95.41 96.73 94.10


100 0 69.57 55.17 10.96 80 20 70.85 61.54 16.44 60 40 73.62 72.00 24.66 40 60 75.11 71.01 33.56 20 80 76.81 73.42 39.73 0 100 78.30 75.58 44.52


Uncertainty-based query selection strategy has been followed to construct the

expert labelled classifier. Table 6-6 shows the performances of the mixed classifiers

for both the case studies. On coal mill data, the performance improvement of using

expert labelled classifier over the automatic labelled one is marginal. Although the

expert labelled classifier achieves better accuracy and precision over the automatic

labelled classifier, the recall value decreases. On boiler data, the improvement is

significant. The expert labelled classifier shows a consistent improvement on accuracy,

precision and recall over the automatic labelled classifier. It is worthwhile to mention

that, the classifier from 100% of expert labelled data achieves improved performance

(12.55% in accuracy, 37% in precision and 300% in recall) over the 100% of automatic

labelled classifier.

To conduct a different experiment, another mixed classifier is constructed using

different percentages of automatic and expert labelled WO. In this case, expert labelled

samples has been chosen randomly instead of uncertainty based query selection

strategy. The performance of the mixed classifier is presented in Table 6-7.

Table 6-7. Performance of the mixed classifier over percentages of automatic and expert labelled (randomly selected) WO

Case Studies

Mixed Classifier Performance on Expert Labelled Test set (%)

% of WO using

automatic labelling

% of WO using expert labelling

(randomly selected)

Accuracy Precision Recall

Power Generation


100 0 94.72 94.57 95.00 80 20 94.83 94.91 94.39 60 40 94.49 94.14 95.00 40 60 94.95 95.18 94.89 20 80 94.97 95.39 94.77 0 100 95.11 95.77 94.80

Sugar Processing

Industry (Boilers)

100 0 69.57 55.17 10.96 80 20 70.42 68.00 22.13 60 40 71.32 70.85 22.89 40 60 73.55 70.55 28.13 20 80 74.99 71.66 31.00 0 100 77.93 73.39 37.71

Performance of uncertainty-based expert labelled classifier is superior to

randomly selected expert labelled classifier (comparing Table 6-6 and Table 6-7).


(a) Coal mills in power generation

(b) Boilers in sugar processing industry

Figure 6-5. Classification accuracies of each mixed-classifier over the percentages of automatic and expert labelled data


Selecting expert labelled samples through uncertainty-based strategy (proposed

in active learning) clearly shows steep increase in accuracy over randomly selected

expert label (see Figure 6-5). The active learning based method achieves significant

level of accuracy using only 40% and 60% of expert labelled data for coal mills and

boilers respectively.

6.5 SUMMARY

To assess the importance of unlabelled data in situations where the labelled data

are rare and costly, the accuracies of AL, AL-SSST, SSST and SVM100% automatic labelled

have been evaluated by varying the number of labelled data items from 5% to 100%.

AL based text classifier achieves similar accuracy to SVM100% expert labelled when the

percentage of labelled data is above 35% for coal mills and is above 80% for boilers.

Figure 6-4 implies that AL outperforms AL-SSST and SSST for both the cases. When

the number of labelled data items is small, i.e. 39.5% for coal mills and 54.08% for

boilers as shown in Table 6-2, the classification accuracy of AL is comparable with

SVM100% automatic labelled and superior to AL-SSST and SSST.

Before identifying the FTD from the DD, the performance of the AL classifier

is validated by comparing the predicted labels of AL classifiers (which are constructed

from different percentages of labelled WO data) on the DD with the estimated “actual”

ones. Table 6-3 displays that one may manually label 40% and 55% of WO data for

coal mills and boilers, respectively, and still achieve comparable accuracy to if 100%

of WO data were manually labelled. Finally, the AL_40% and AL_55% classifiers for

coal mills and boilers respectively are applied to the DD to identify the historical FTD.

Table 6-6 suggests significant benefits of expert labelled text classifier over

automatic labelled. Although the performance of the expert labelled text classifier is

not improving on coal mill data, such text classifier performs well on boiler data and

shows significant performance improvement (12.55% in accuracy, 37% in precision

and 300% in recall). Instead of randomly selected expert labelled data samples, the

proposed active learning method (using uncertainty sampling) achieves significant

accuracy using small percentage of labelled data (as shown in Figure 6-5).

In boiler data, the performance of the expert labelled text classifier is

significantly improved over automatic labelled text classifier. Especially, the expert

labelled classifier provides improved recall value (44.52%) over automatic labelled


classifier (10.96%) and thus mitigates the difficulty of predicting larger false negative

values mentioned in Section 5.3.

123

Chapter 7: Conclusion and Future Research Directions

7.1 CONCLUSION

In real practice, an essential input to reliability and maintenance optimisation

analysis is historical failure times or times of failure of an asset. However, it is not

always readily available in industry maintenance databases due to incorrect recording

and historically poor data management [15]. In other words, there is a gap between the

information required for reliability and optimization modelling and the actual data

typically available in industry databases. This study was attempts to bridge this

significant gap by identifying information requirement specifications for reliability

and maintenance optimization models as well as developing novel methods to analyse

the typically collected maintenance data (i.e., data available in both numerical and text

formats) so as to meet the requirement of models. In addition to analysing the

maintenance data, this study developed a new information extraction methodology to

increase the accuracy of estimating historical failure times for reliability analysis,

using real world maintenance data. The key idea of such methodology was the use of

work orders (WOs) and downtime data (DD) to jointly determine when an asset has

“failed” historically.

In the first, the WOs were automatically labelled using data fields and these were

linked with the DD using the free texts across both data systems. To overcome the

shortcoming of automatic labelling (due to unreliable data field), a modified method

considering semi-supervised text mining method was proposed. Such a method utilised

the active learning concept to minimise the number of data items to be labelled by an

expert. These methods were demonstrated on two real world case studies and the

results showed that the methods were promising.

The main contributions of the work of this thesis are summarised as the

following:

• Information Requirement Specification. Reliability and maintenance

optimisation models often require accurate “failure times” of an asset and

124

many times these models are fairly complex. Due to a lack of the required

information, using such models often lead to poor reliability estimation

and inaccurate maintenance decisions. Moreover, the existing literature

has rarely investigated the link between information requirements and

the data typically available in the asset and maintenance data across

multiple databases. This research investigated the existing literature to

identify the information necessary for the models as well as the

availability of the information in existing maintenance databases. This

study proposed a framework and established information requirement

specifications based on literature for reliability and maintenance

optimization models. The requirement specifications provide a guideline

to record asset and maintenance data correctly (Chapter 3).

• Failure and Non-Failure Maintenance Times Identification. Asset

intensive organisations have different databases each may contain parts

of the information needed to define a failure event. Some common

databases (e.g. WOs) contain the record of every maintenance activity

(i.e., a repair, check or routine inspection) performed on an asset.

However, these databases rarely contain information on the motivation

for the activity, (i.e. was the issue raised to fix a “failure” or due to

planned maintenance and did this event cause any downtime?) and

impact of the activity to the system. Conversely, some DD systems

contain detailed information on when the asset is not operating, but lacks

an unambiguous indication of the reason for the stoppage. Thus, it is

often the case that individual database does not possess the necessary and

complete information to identify a “failure” event, where one needs to

know both when the asset is down and if this downtime was unplanned.

This thesis developed a novel method to identify failure and non-failure

maintenance times using multiple maintenance databases: WO and DD.

Using the urgent and high priority maintenance work descriptions in

WOs, a text classifier was constructed and applied to assign each DD

event into one of two classes: failure and non-failure. The proposed

method thus identified DD events whose work descriptions are consistent

with urgent and high priority WOs. Validation of the text classifier and

125

the analysis of the identified failure events confirmed the accurate

identification of failures in the DD. Using the novel method, reliability

and optimization models can be applied to existing industrial settings.

• Improvement in Classification Accuracy: This thesis tested the failure

and non-failure maintenance time methodology and the advanced text

classification methodology on maintenance data from two real world

industrial case studies. Certain methodologies proved to be effective to

accurately identify the failure and non-failure time. Although SVM based

text classifiers were found to outperform NB based classifiers for both

coal mill and boiler cases, the performance was not superlative

(maximum accuracy 74.04% and precision 60%) or even worse

(maximum recall 26.87%) in boilers. In this regard, the active learning

based text classifier showed significant improvement in classification

performance, especially for boilers. By querying the most informative

samples from WOs which, further, were labelled by an expert, this thesis

constructed a text classifier. The performance improvement were

significant in terms of recall (66%), accuracy (6%) and precision (26%)

in boilers. The improved recall allowed the identification of the correct

labels of the text classifier and thus reduced the larger false negative

value (predicted by the text classifier).

• Advanced Information Extraction Methodology. Training of

classifiers to recognize failure/non-failure descriptions requires a set of

labelled data which is often provided by an expert. A key challenge is

thus the “expense” of labelling; an expert must assess each text

description individually and thus labelling all of the data is unfeasible.

Furthermore, if expert judgement is to be used, experts may need

information from both the WOs and DD to form a reliable opinion about

the maintenance in question. Thus, it is important that WOs and the DD

are interpreted jointly, by the expert. This thesis thus developed an

advanced method to identify historical failure times by linking the WOs

and DD using both text mining and expert judgement. A text classifier

was constructed using expert judgement (i.e. the active learning concept

allowed maintenance data which are most informative to the classifier to

126

be labelled manually in the least possible amount). Thus, active learning

played a crucial role to mitigate the cost of constructing a text classifier

from a limited number of expert-labelled samples. The constructed

classifier was eventually applied to label each DD item into failure and

non-failure, thus giving each item one or other of these attributes. Results

from the case study demonstration imply that active learning can

decrease the number of labelled samples by approximately 50% while

achieving the same classification accuracy. The outcomes of this study

can be used to develop statistical models of failure times from historical

maintenance databases and maintenance records where the only

consistently available data is a free text description.

In chapter 1, a summary of this thesis was provided. In Chapter 2, previous

studies on reliability and maintenance optimization models, data recording techniques

and text mining were broadly discussed. A potential gap between the information

required for reliability and optimization models and the actual data available in

maintenance databases was identified and described in detail. In Chapter 3, reliability

and maintenance optimization models were investigated. An information requirement

framework was constructed for those models as well as for identifying the required

information in existing maintenance databases.

In Chapter 4, a novel information extraction methodology was proposed. The

proposed method identifies the “failure” events in historical maintenance databases

whose text descriptions are consistent with the definition of failure. In Chapter 5, the

proposed information extraction methodology was demonstrated on two real world

case studies, with results showing that the methodology is promising. Performance

validation of the text classifiers (Table 5-8 and Table 5-18), randomly selected text

mined maintenance data (shown in Table 5-10 and Table 5-20) and word clouds Figure

5-4 and Figure 5-8 seem to largely validate that the proposed approach is capable of

identifying failure and non-failure text descriptions well.

Chapter 6 presented a novel approach to extract failure time information using a

minimum number of maintenance data items labelled by an expert. An active learning

based text classifier was constructed using work orders (WOs) and this was

subsequently applied to downtime data (DD) to jointly determine when an asset has

“failed”. Like Chapter 5, the method was demonstrated on two real world case studies,

127

with results showing that the methodology is effective in such situations. Figure 6-4

indicates that active learning based text classifiers have superior accuracy over other

semi-supervised methods. Table 6-2 suggests that one can achieve a maximum

accuracy of text classifiers by manually labelling around 50% of the data. The findings

from the validation of the text classifier (Table 6-3) and benefits of expert labelling

classifier (Table 6-6 and Table 6-7) reveal that the active learning concept can be

applied to maintenance data after an expert manually labels a minimum number of

unlabelled data items. Using such minimal efforts from an expert, the method can

effectively identify failure time information from a large set of historical maintenance

databases.

7.2 FUTURE RESEARCH

Through the approach discussed in this thesis, great progress and several

contributions have been made; however, there are still some deficiencies that need to

be improved. Based on the situation, future research directions are outlined as follows:

• Future research might be focused on developing methods to recognise

new vocabulary and update the text classifier to incorporate the new

keywords which may arise from different personnel filling out the

maintenance logs. Although a data fusion approach has been proposed in

this study (a text mining method to link the WOs and the DD), the

classifier could be constructed from both the WOs and the DD. Such a

classifier might be more effective to label future data entries from the

WOs or the DD.

• This thesis constructed the active learning based text classifier using a

minimum number of expert-labelled WOs. To overcome shortcomings

due to variations in maintenance logs being filled by different

individuals, the classifier might be constructed using both the WOs and

the DD. However, due to the lack of additional information other than

work descriptions and work entry dates, an expert might find difficulty

in labelling the DD. In this regard, the expert could utilise other available

information (e.g. cause codes and cause descriptions) or, another related

maintenance database (e.g. the plan to work, generally denoted as PTW).

128 Appendices

• Regarding active learning query selection strategy, this thesis used the

most general strategy, on a data sample which had the least confidence

in label prediction. However, it did not consider the remaining label

distribution, which could be mitigated by using margin sampling or an

entropy measure. Future work could be directed towards testing more

sophisticated query strategies i.e., query by committee (QBC), expected

model change and error reduction. Furthermore, the query strategies

could also be employed with probabilistic classification models (i.e.,

naïve Bayes) or, non-probabilistic ones (i.e., k-nearest neighbour).

• Selecting the best features is a crucial part of a text classification model.

Features that are more sophisticated will be explored (e.g. via frequent

and sequential pattern mining) with the goal of improving the

classification accuracy. Such classifiers will be constructed using

complex maintenance data (i.e., features consisting of frequent and

sequential patterns) to extract failure time information more precisely.

Another effective way of constructing text classifier is to train multiple

levels of text features using deep learning algorithms i.e. deep neural

networks (DNN), convolutional neural network (CNN), and recurrent

neural network (RNN). The constructed network would be consisted of

input, output and hidden layers and able to handle text documents with

high dimensional features. In future, different features (e.g. TF-IDF, CS,

IG and N-Gram) may be utilised and incorporated into the expert labelled

classifier (e.g. the active learning based text classifier). Thus the

extracted best feature could be used in deep learning or expert-based text

classifier to improve the classification performance.

Appendices 129

Appendices

Appendix A

Keyword Dictionaries Constructed from Different Text Features for Case Study

1 (Coal Mills)

A portion of keyword dictionary (𝜒𝜒21608) using top 35 CS features Record No. Keywords [1-7] "air" "fan" "seal" "leak" "chang" "filter" "blast" [8-14] "unit" "tapping" "point" "pocket" "bunker" "pyrites" "thermocouple" [15-21] "flow" "greas" "filters" "gbox" "top" "bottom" "micron" [22-28] "inspect" "lube" "limit" "please" "gate" "repair" "replace" [29-35] "feeder" "outlet" "open" "routin" "cold" "faulti" "temp"

A portion of keyword dictionary (𝑁𝑁𝑁𝑁1,2,3)) using Mixed-Gram (mixed of Uni, Bi and Tri-Gram) features

Record No. Keywords [100-104] "coal feed" "coal feeder" "coal flow" "coal leak" "cold" [105-109] "cold air" "cold air damp" "cold air damper" "cold air g" "cold air gate" [110-114] "coming" "comp" "computer" "computer point" "continually" [115-119] "control" "control damper" "control valv" "conveyor" "cooler" [120-124] "cooler bypass" "corner" "corner pf" "corners" "cost" [125-129] "coupling" "current" "damp" "damper" "damper gland" [130] "damper gland check"

130 Appendices

Appendix B

Confusion Metrix for SVM Text Classifier using Different Text Features for

Case Study 1 (Coal Mills)

Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡1)


Prediction Failure 970 38

Non-Failure 32 847




Non-Failure 24 849




Non-Failure 25 849

Term frequency-inverse document frequency feature using keyword dictionary

((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)1) Actual Failure Non-Failure


Non-Failure 34 846

Appendices 131

Term frequency-inverse document frequency feature using keyword dictionary ((𝑡𝑡𝑡𝑡 − 𝑖𝑖𝑑𝑑𝑡𝑡)5)



Failure 24 847




Failure 25 848

Chi-square feature using keyword dictionary (𝜒𝜒21608)



Failure 18 825




Failure 19 823




Failure 19 787

132 Appendices

Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1608) Actual Failure Non-Failure


Non-Failure 17 825



Non-Failure 19 823



Non-Failure 19 787

Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2) Actual Failure Non-Failure


Non-Failure 10 822

Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁3) Actual Failure Non-Failure


Non-Failure 12 828

Appendices 133

Uni-Bi-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2) Actual Failure Non-Failure


Failure 12 828

Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁2,3) Actual Failure Non-Failure


Failure 14 803

Uni-Bi-Tri-Gram feature using keyword dictionary (𝑁𝑁𝑁𝑁1,2,3) Actual Failure Non-Failure


Failure 12 828

134 Appendices

Confusion Metrix for NB Text Classifier using Different Text Features for Case

Study 1 (Coal Mills)




Non-Failure 82 822




Non-Failure 58 807




Non-Failure 51 791




Non-Failure 82 822

Appendices 135




Failure 58 807




Failure 51 791




Failure 82 822




Failure 84 823




Failure 58 803

136 Appendices



Non-Failure 82 822



Non-Failure 85 823



Non-Failure 61 804



Non-Failure 36 790



Non-Failure 10 656

Appendices 137



Failure 38 800



Failure 16 773



Failure 29 791

138 Appendices

Appendix C

Keyword Dictionaries Constructed from Different Text Features for Case Study

2 (Boilers)

A portion of keyword dictionary (𝑁𝑁𝑁𝑁2) using Bi-Gram features Record No. Keywords [2000-2005] "makeup valv" "man coolers" "man start" "manifold leak" "manifold

submerged" [2006-2010] "manual door" "manufacture cover" "manufacture install" "manufacture rol"

"manufacture spare" [2011-2015] "mark drill" "mc spare" "mech elec" "mech seal" "mech

service" [2016-2020] "mecseal leak" "mesh floor" "meter clean" "min flow" "minimum

flow" [2021-2025] "minor overhaul" "missing refractory" "mixing chamb" "mm airheater" "mm

long"

A portion of keyword dictionary (𝑁𝑁𝑁𝑁3) using Tri-Gram features Record No. Keywords [3000-3003] "spray nozzles rod" "spray valves limit" "sprays main steam" [3004-3006] "sprays pipes blr" "spreader air duct" "sprhtr control valve" [3007-3009] "sprhtr loops outsourc" "sprhtr sfty lh" "sprockets fabricate sprocket" [3010-3012] "sprockets reclaim bag" "sprockets slat con" "square hole steel" [3013-3015] "ss pipe ton" "ss tube nitrogen" "stack drain sid"

Appendices 139

Appendix D

Confusion Metrix for SVM Text Classifier using Different Text Features for

Case Study 2 (Boilers)




Failure 147 372




Failure 139 366

Term frequency feature using keyword dictionary (𝑡𝑡𝑡𝑡5) Actual Failure Non-Failure


Failure 135 360



Failure 145 364

140 Appendices




Non-Failure 145 378




Non-Failure 144 371




Non-Failure 142 369




Non-Failure 147 369




Non-Failure 117 321

Appendices 141




Failure 110 315

Chi-square feature using keyword dictionary (𝜒𝜒2700) Actual Failure Non-Failure


Failure 103 312



Failure 98 312



Failure 109 323

Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1297)



Failure 117 321

142 Appendices



Non-Failure 110 315



Non-Failure 102 313



Non-Failure 98 312



Non-Failure 109 323



Non-Failure 129 336

Appendices 143



Failure 132 336



Failure 126 335



Failure 130 335



Failure 119 323

144 Appendices

Confusion Metrix for NB Text Classifier using Different Text Features for Case

Study 2 (Boilers)




Non-Failure 91 309




Non-Failure 96 308



Non-Failure 114 328



Non-Failure 129 342

Appendices 145




Failure 91 309




Failure 91 309




Failure 114 328




Failure 129 342




Failure 72 265

146 Appendices




Non-Failure 76 272



Non-Failure 83 283



Non-Failure 89 292



Non-Failure 98 299

Information Gain feature using keyword dictionary (𝐼𝐼𝑁𝑁1297)



Non-Failure 72 265

Appendices 147



Failure 78 273



Failure 84 281



Failure 90 292



Failure 94 300



Failure 108 299

148 Appendices



Non-Failure 124 322



Non-Failure 61 242



Non-Failure 105 287



Non-Failure 42 211

Appendices 149

Appendix E

Confusion Metrix for SVM Text Classifier in Active Learning using Different

Percentages of Labelled WO for Case Study 1 (Coal Mills)

Comparing the predicted WO testing with actual values (Using 5 % of expert labelled data from WO training)



Failure 154 253

Comparing the predicted WO testing with actual values (Using 8.2 % of expert labelled data from WO training)



Failure 28 242

Comparing the predicted WO testing with actual values (Using 11.11 % of expert

labelled data from WO training) Actual Failure Non-Failure


Failure 18 233




Failure 15 213

150 Appendices




Non-Failure 15 209




Non-Failure 17 243




Non-Failure 17 245




Non-Failure 16 244




Non-Failure 17 247

Appendices 151




Failure 17 247




Failure 18 247




Failure 19 247

152 Appendices

Confusion Metrix for SVM Text Classifier in Active Learning using Different

Percentages of Labelled WO for Case Study 2 (Boilers)




Non-Failure 99 253

Comparing the predicted WO testing with actual values (Using 18.88 % of expert

labelled data from WO training) Actual Failure Non-Failure


Non-Failure 122 318




Non-Failure 103 316




Non-Failure 92 310

Appendices 153




Failure 85 305




Failure 83 309




Failure 81 308




Failure 82 306




Failure 77 309

154 Appendices




Non-Failure 75 308




Non-Failure 76 308




Non-Failure 74 308

Appendices 155

Appendix F

Confusion Metrix for Active Learning-based Text Classifier Applied over DD

for Case Study 1 (Coal Mills)

Comparing the predicted DD labels with actual ones (Using 10 % of expert labelled data from WO training)



Failure 25 32




Failure 36 38




Failure 34 38




Failure 25 38

156 Appendices




Non-Failure 23 15




Non-Failure 4 20

Appendices 157

Confusion Matrix for Active Learning-based Text Classifier Applied over DD

for Case Study 2 (Boilers)




Failure 170 565




Failure 167 638




Failure 173 643




Failure 173 775

158 Appendices




Non-Failure 171 771




Non-Failure 141 772




Non-Failure 130 784

Bibliography 159

Bibliography

1. Latino, K., Understanding event data collection: Part 1. Plant Engineering,

2004: p. 31-32.

2. Louit, D.M., R. Pascual, and A.K.S. Jardine, A practical procedure for the

selection of time-to-failure models based on the assessment of trends in

maintenance data. Reliability Engineering & System Safety, 2009. 94(10): p.

1618-1628.

3. Prytz, R., S. Nowaczyk, T. Rognvaldsson, and S. Byttner, Predicting the need

for vehicle compressor repairs using maintenance records and logged vehicle

data. Engineering Applications of Artificial Intelligence, 2015. 41: p. 139-150.

4. Devaney, M., A. Ram, H. Qiu, and J. Lee. Preventing failures by mining

maintenance logs with case-based reasoning. in In 59th Meeting of the Society

for Machinery Failure Prevention Technology (MFPT-59). 2005.

5. Hodkiewicz, M. and M.T.W. Ho, Cleaning historical maintenance work order

data for reliability analysis. Journal of Quality in Maintenance Engineering,

2016. 22(2): p. 146-163.

6. Sipos, R., D. Fradkin, F. Moerchen, and Z. Wang, Log-based predictive

maintenance, in Proceedings of the 20th ACM SIGKDD international

conference on Knowledge discovery and data mining. 2014, ACM: New York,

New York, USA. p. 1867-1876.

7. Edwards, B., M. Zatorsky, and R. Nayak, Clustering and classification of

maintenance logs using text data mining, in Proceedings of the 7th

Australasian Data Mining Conference. 2008, Australian Computer Society,

Inc.: Glenelg, Australia. p. 193-199.

8. Moreira, R.d.P. and C.L.N. Junior. Prognostics of aircraft bleed valves using a

SVM classification algorithm. in Aerospace Conference, 2012 IEEE. 2012.

9. Ruiz, P.P., B.K. Foguem, and B. Grabot, Improving Maintenance Strategies

from Experience Feedback. IFAC Proceedings Volumes, 2013. 46(9): p. 625-

630.

160 Bibliography

10. Waeyenbergh, G. and L. Pintelon, A framework for maintenance concept

development. International Journal of Production Economics, 2002. 77(3): p.

299-313.

11. Duffuaa, S.O., A. Raouf, and J.D. Campbell, Planning and control of

maintenance systems: Modelling & Analysis. Second ed. 2010: Springer.

12. Muchiri, P., L. Pintelon, L. Gelders, and H. Martin, Development of

maintenance function performance measurement framework and indicators.

International Journal of Production Economics, 2011. 131(1): p. 295-302.

13. Manzini, R., A. Regattieri, H. Pham, and E. Ferrari, Maintenance for industrial

systems. 2009: Springer.

14. Wienker, M., K. Henderson, and J. Volkerts, The Computerized Maintenance

Management System An essential Tool for World Class Maintenance.

Symphos 2015 - 3rd International Symposium on Innovation and Technology

in the Phosphate Industry, 2016. 138: p. 413-420.

15. Ahmad, R. and S. Kamaruddin, An overview of time-based and condition-

based maintenance in industrial application. Computers & Industrial

Engineering, 2012. 63(1): p. 135-149.

16. Guillén, A.J., A. Crespo, J.F. Gómez, and M.D. Sanz, A framework for

effective management of condition based maintenance programs in the context

of industrial development of E-Maintenance strategies. Computers in Industry,

2016. 82: p. 170-185.

17. Alrabghi, A. and A. Tiwari, State of the art in simulation-based optimisation

for maintenance systems. Computers & Industrial Engineering, 2015. 82: p.

167-182.

18. Ding, S.H. and S. Kamaruddin, Maintenance policy optimization-literature

review and directions. International Journal of Advanced Manufacturing

Technology, 2015. 76(5-8): p. 1263-1283.

19. Dekker, R., Applications of maintenance optimization models: a review and

analysis. Reliability Engineering & System Safety, 1996. 51(3): p. 229-240.

20. Sharma, A., G.S. Yadava, and S.G. Deshmukh, A literature review and future

perspectives on maintenance optimization. Journal of Quality in Maintenance

Engineering, 2011. 17(1): p. 5-25.

Bibliography 161

21. Horenbeek, A.V., L. Pintelon, and P. Muchiri, Maintenance optimization

models and criteria. International Journal of System Assurance Engineering

and Management, 2011. 1(3): p. 189-200.

22. Alaswad, S. and Y. Xiang, A review on condition-based maintenance

optimization models for stochastically deteriorating system. Reliability

Engineering & System Safety, 2017. 157: p. 54-63.

23. Gabbar, H.A., H. Yamashita, K. Suzuki, and Y. Shimada, Computer-aided

RCM-based plant maintenance management system. Robotics and Computer-

Integrated Manufacturing, 2003. 19(5): p. 449-458.

24. Prajapati, A., J. Bechtel, and S. Ganesan, Condition based maintenance: a

survey. Journal of Quality in Maintenance Engineering, 2012. 18(4): p. 384-

400.

25. Barabadi, A., J. Barabady, and T. Markeset, Maintainability analysis

considering time-dependent and time-independent covariates. Reliability

Engineering & System Safety, 2011. 96(1): p. 210-217.

26. Chen, G. and T.T. Pham, Introduction to Fuzzy Systems. CRC Applied

Mathematics and Nonlinear Science Series. 2005: Taylor & Francis Group.

27. Mccall, J.J., Maintenance Policies for Stochastically Failing Equipment - a

Survey. Management Science, 1965. 11(5): p. 493-524.

28. Perakis, A.N. and B. Inozu, Optimal Maintenance, Repair, and Replacement

for Great-Lakes Marine Diesels. European Journal of Operational Research,

1991. 55(2): p. 165-182.

29. Sherif, Y.S., Reliability-Analysis - Optimal Inspection and Maintenance

Schedules of Failing Systems. Microelectronics and Reliability, 1982. 22(1):

p. 59-115.

30. Kijima, M., Some results for repairable systems with with general repair.

Journal of Applied Probability, 1989. 26: p. 89-102.

31. Doyen, L. and O. Gaudoin, Imperfect repair models with planned preventive

maintenance. 2009.

32. Doyen, L. and O. Gaudoin, Classes of imperfect repair models based on

reduction of failure intensity or virtual age. Reliability Engineering & System

Safety, 2004. 84(1): p. 45-56.

162 Bibliography

33. Shin, I., T.J. Lim, and C.H. Lie, Estimating parameters of intensity function

and maintenance effect for repairable unit. Reliability Engineering & System

Safety, 1996. 54(1): p. 1-10.

34. Ramírez, P.A.P. and I.B. Utne, Decision support for life extension of technical

systems through virtual age modelling. Reliability Engineering & System

Safety, 2013. 115: p. 55-69.

35. Wang, H. and H. Pham, Reliability and optimal maintenance. 2006: Springer

Science & Business Media.

36. Pulcini, G., On the prediction of future failures for a repairable equipment

subject to overhauls. Communications in Statistics-Theory and Methods, 2001.

30(4): p. 691-706.

37. Altun, M. and S.V. Comert, A change-point based reliability prediction model

using field return data. Reliability Engineering & System Safety, 2016. 156: p.

175-184.

38. Guo, H.R.R., H.T. Liao, W.B. Zhao, and A. Mettas, A new Stochastic model

for systems under general repairs. Ieee Transactions on Reliability, 2007.

56(1): p. 40-49.

39. Muhammad, M., A.A. Mokhtar, and H. Hussin. Reliability assessment

framework for repairable system. in Business, Engineering and Industrial

Applications (ISBEIA), 2012 IEEE Symposium on. 2012.

40. Regattieri, A., R. Manzini, and D. Battini, Estimating reliability characteristics

in the presence of censored data: A case study in a light commercial vehicle

manufacturing system. Reliability Engineering & System Safety, 2010. 95(10):

p. 1093-1102.

41. Si, X.S., W.B. Wang, C.H. Hu, and D.H. Zhou, Remaining useful life

estimation - A review on the statistical data driven approaches. European

Journal of Operational Research, 2011. 213(1): p. 1-14.

42. de Jonge, B., R. Teunter, and T. Tinga, The influence of practical factors on

the benefits of condition-based maintenance over time-based maintenance.

Reliability Engineering & System Safety, 2017. 158: p. 21-30.

43. Zhang, Q., C. Hua, and G.H. Xu, A mixture Weibull proportional hazard model

for mechanical system failure prediction utilising lifetime and monitoring data.

Mechanical Systems and Signal Processing, 2014. 43(1-2): p. 103-112.

Bibliography 163

44. Raouf, A., S. Duffuaa, M. Ben-Daya, A.H.C. Tsang, W.K. Yeung, A.K.S.

Jardine, and B.P.K. Leung, Data management for CBM optimization. Journal

of Quality in Maintenance Engineering, 2006. 12(1): p. 37-51.

45. Bastos, P., I. Lopes, and L.C.M. Pires. Application of data mining in a

maintenance system for failure prediction. in Safety, Reliability and Risk

Analysis: Beyond the Horizon: 22nd European Safety and Reliability. 2014.

Taylor & Francis Group.

46. Márquez, A.C., The maintenance management framework: models and

methods for complex systems maintenance. 2007: Springer Science &

Business Media.

47. Jardine, A.K.S. and A.H.C. Tsang, Maintenance, replacement, and reliability:

theory and applications. 2013: CRC press.

48. Tian, Z.G. and H.T. Liao, Condition based maintenance optimization for multi-

component systems using proportional hazards model. Reliability Engineering

& System Safety, 2011. 96(5): p. 581-589.

49. Jardine, A., V. Makis, D. Banjevic, D. Braticevic, and M. Ennis, A decision

optimization model for condition-based maintenance. Journal of Quality in

Maintenance Engineering, 1998. 4(2): p. 115-121.

50. Lin, J., J. Pulido, and M. Asplund, Reliability analysis for preventive

maintenance based on classical and Bayesian semi-parametric degradation

approaches using locomotive wheel-sets as a case study. Reliability


51. Jardine, A.K.S., D.M. Lin, and D. Banjevic, A review on machinery

diagnostics and prognostics implementing condition-based maintenance.

Mechanical Systems and Signal Processing, 2006. 20(7): p. 1483-1510.

52. Kelly, A., Maintenance Systems and Documentation. 2006, Burlington, MA,

USA: Elsevier.

53. Hong, Y., Reliability prediction based on complicated data and dynamic data.

2009, Iowa State University. p. 1-130.

54. Meeker, W.Q. and Y. Hong, Reliability Meets Big Data: Opportunities and

Challenges. Quality Engineering, 2013. 26(1): p. 102-116.

55. Skoogh, A., T. Perera, and B. Johansson, Input data management in simulation

- Industrial practices and future trends. Simulation Modelling Practice and

Theory, 2012. 29: p. 181-192.

164 Bibliography

56. Moore, W.J. and A.G. Starr, An intelligent maintenance system for continuous

cost-based prioritisation of maintenance activities. Computers in Industry,

2006. 57(6): p. 595-606.

57. Madhikermi, M., S. Kubler, J. Robert, A. Buda, and K. Främling, Data quality

assessment of maintenance reporting procedures. Expert Systems with

Applications, 2016. 63: p. 145-164.

58. Alkali, B.M., T. Bedford, J. Quigley, and J. Gaw, Failure and maintenance data

extraction from power plant maintenance management databases. Journal of

Statistical Planning and Inference, 2009. 139(5): p. 1766-1776.

59. Ittoo, A., L.M. Nguyen, and A. van den Bosch, Text analytics in industry:

Challenges, desiderata and trends. Computers in Industry, 2016. 78: p. 96-107.

60. Hogenboom, F., F. Frasincar, U. Kaymak, F. de Jong, and E. Caron, A Survey

of event extraction methods from text for decision support systems. Decision

Support Systems, 2016. 85: p. 12-22.

61. Wu, X.D., X.Q. Zhu, G.Q. Wu, and W. Ding, Data Mining with Big Data. Ieee

Transactions on Knowledge and Data Engineering, 2014. 26(1): p. 97-107.

62. Gurbuz, F., L. Ozbakir, and H. Yapici, Data mining and preprocessing

application on component reports of an airline company in Turkey. Expert

Systems with Applications, 2011. 38(6): p. 6618-6626.

63. Jiawei, H., Data mining: Concepts and techniques. 2001, University Of Simon

Fraser.

64. Fayyad, U., G.P. Shapiro, and P. Smyth, From Data Mining to Knowledge

Discovery in Databases, in Artificial Intelligence. 1996, American Association

for Artificial Intelligence. p. 37-54.

65. Alkharboush, N.A., A Data Mining Approach to Improve the Automated

Quality of Data, in School of Electrical Engineering and Computer Science.

2013, Queensland University of Technology: Brisbane. p. 1-193.

66. Kotu, V. and B. Deshpande, Predictive analytics and data mining: concepts and

practice with rapidminer. 2014, Burligton: Elsevier Science.

67. Sharma, S., K.M. Osei-Bryson, and G.M. Kasper, Evaluation of an integrated

Knowledge Discovery and Data Mining process model. Expert Systems with

Applications, 2012. 39(13): p. 11335-11348.

Bibliography 165

68. Moro, S., P. Cortez, and P. Rita, Business intelligence in banking: A literature

analysis from 2002 to 2013 using text mining and latent Dirichlet allocation.

Expert Systems with Applications, 2015. 42(3): p. 1314-1324.

69. Low, W.L., M.L. Lee, and T.W. Ling, A knowledge-based approach for

duplicate elimination in data cleaning. Information Systems, 2001. 26(8): p.

585-606.

70. Ur-Rahman, N. and J.A. Harding, Textual data mining for industrial

knowledge management and text classification: A business oriented approach.


71. Munkova, D., M. Munk, and M. Vozar, Data Pre-Processing Evaluation for

Text Mining: Transaction/Sequence Model. 2013 International Conference on

Computational Science, 2013. 18: p. 1198-1207.

72. Sriurai, W., Improving text categorization by using a topic model. Advanced

Computing, 2011. 2(6): p. 21.

73. Borrajo, L., A.S. Vieira, and E.L. Iglesias, TCBR-HMM: An HMM-based text

classifier with a CBR system. Applied Soft Computing, 2015. 26: p. 463-473.

74. Mathew, T. Text categorization using N-grams and Hidden-Markov-Models.

2006.

75. Yang, J.M., Y.N. Liu, Z. Liu, X.D. Zhu, and X.X. Zhang, A new feature

selection algorithm based on binomial hypothesis testing for spam filtering.

Knowledge-Based Systems, 2011. 24(6): p. 904-914.

76. Yang, Y. and J.O. Pedersen. A comparative study on feature selection in text

categorization. in ICML. 1997.

77. Rogati, M. and Y. Yang. High-performing feature selection for text

classification. in Proceedings of the eleventh international conference on

Information and knowledge management. 2002. ACM.

78. Zhang, L., J. Zhu, and T. Yao, An evaluation of statistical spam filtering

techniques. ACM Transactions on Asian Language Information Processing

(TALIP), 2004. 3(4): p. 243-269.

79. Liu, Y.N., J.-W. Bi, and Z.-P. Fan, Multi-class sentiment classification: The

experimental comparisons of feature selection and machine learning

algorithms. Expert Systems with Applications, 2017. 80: p. 323-339.

166 Bibliography

80. Wang, W.B., F. Zhao, and R. Peng, A preventive maintenance model with a

two-level inspection policy based on a three-stage failure process. Reliability


81. Chen, J., H. Huang, S. Tian, and Y. Qu, Feature selection for text classification

with Naïve Bayes. Expert Systems with Applications, 2009. 36(3): p. 5432-

5435.

82. Chen, K., Z. Zhang, J. Long, and H. Zhang, Turning from TF-IDF to TF-IGM

for term weighting in text classification. Expert Systems with Applications,

2016. 66: p. 245-260.

83. Trstenjak, B., S. Mikac, and D. Donko, KNN with TF-IDF Based Framework

for Text Categorization. 24th Daaam International Symposium on Intelligent

Manufacturing and Automation, 2013, 2014. 69: p. 1356-1364.

84. Escalante, H.J., M.A. Garcia-Limon, A. Morales-Reyes, M. Graff, M. Montes-

y-Gomez, E.F. Morales, and J. Martinez-Carranza, Term-weighting learning

via genetic programming for text classification. Knowledge-Based Systems,

2015. 83: p. 176-189.

85. Wang, D.Q., H. Zhang, R. Liu, W.F. Lv, and D.T. Wang, t-Test feature

selection approach based on term frequency for text categorization. Pattern

Recognition Letters, 2014. 45: p. 1-10.

86. Tripathy, A., A. Agrawal, and S.K. Rath, Classification of sentiment reviews

using n-gram machine learning approach. Expert Systems with Applications,

2016. 57: p. 117-126.

87. Ogada, K., W. Mwangi, and W. Cheruiyot, N-gram based text categorization

method for improved data mining. Journal of Information Engineering and

Applications, 2015. 5(8): p. 35-43.

88. Pang, B. and L. Lee. A sentimental education: Sentiment analysis using

subjectivity summarization based on minimum cuts. in Proceedings of the

42nd annual meeting on Association for Computational Linguistics. 2004.

Association for Computational Linguistics.

89. Saleh, M.R., M.T. Martin-Valdivia, A. Montejo-Raez, and L.A. Urena-Lopez,

Experiments with SVM to classify opinions in different domains. Expert


90. Lantz, B., Machine Learning with R. Vol. 1. 2013, GB: Packt Publishing.

Bibliography 167

91. Vo, D.T. and C.Y. Ock, Learning to classify short text from scientific

documents using topic models with various types of knowledge. Expert


92. Fragos, K., P. Belsis, and C. Skourlas, Combining Probabilistic Classifiers for

Text Classification. 3rd International Conference on Integrated Information

(Ic-Ininfo), 2014. 147: p. 307-312.

93. Cortes, C. and V. Vapnik, Support-Vector Networks. Machine Learning, 1995.

20(3): p. 273-297.

94. Joachims, T. Text categorization with support vector machines: Learning with

many relevant features. in European conference on machine learning. 1998.

Springer.

95. Basu, A., C. Walters, and M. Shepherd. Support vector machines for text

categorization. in System Sciences, 2003. Proceedings of the 36th Annual

Hawaii International Conference on. 2003. IEEE.

96. Colas, F. and P. Brazdil. Comparison of SVM and some older classification

algorithms in text classification tasks. in IFIP International Conference on

Artificial Intelligence in Theory and Practice. 2006. Springer.

97. Ozgur, L., T. Gungor, and F. Gurgen, Adaptive anti-spam filtering for

agglutinative languages: a special case for Turkish. Pattern Recognition

Letters, 2004. 25(16): p. 1819-1831.

98. Yu, B. and Z.B. Xu, A comparative study for content-based dynamic spam

classification using four machine learning algorithms. Knowledge-Based

Systems, 2008. 21(4): p. 355-362.

99. Lai, C.C., An empirical study of three machine learning methods for spam

filtering. Knowledge-Based Systems, 2007. 20(3): p. 249-254.

100. Webb, S., S. Chitti, and C. Pu. An experimental evaluation of spam filter

performance and robustness against attack. in Collaborative Computing:

Networking, Applications and Worksharing, 2005 International Conference

on. 2005. IEEE.

101. Androutsopoulos, I., G. Paliouras, and E. Michelakis, Learning to filter

unsolicited commercial e-mail. 2004.

102. Yadav, S.K., Sentiment analysis and classification: a survey. International

Journal of Advance Research in Computer Science and Management Studies,

2015. 3(3): p. 113-121.

168 Bibliography

103. Agarwal, B. and N. Mittal, Prominent feature extraction for review analysis:

an empirical study. Journal of Experimental & Theoretical Artificial

Intelligence, 2016. 28(3): p. 485-498.

104. Omar, N., M. Albared, T. Al-Moslmi, and A. Al-Shabi. A comparative study

of feature selection and machine learning algorithms for arabic sentiment

classification. in Asia Information Retrieval Symposium. 2014. Springer.

105. Sharma, A. and S. Dey. A comparative study of feature selection and machine

learning techniques for sentiment analysis. in Proceedings of the 2012 ACM

Research in Applied Computation Symposium. 2012. ACM.

106. Tan, S. and J. Zhang, An empirical study of sentiment analysis for chinese

documents. Expert Systems with applications, 2008. 34(4): p. 2622-2629.

107. Liu, K. and N. El-Gohary, Ontology-based semi-supervised conditional

random fields for automated information extraction from bridge inspection

reports. Automation in Construction, 2017.

108. Lavergne, T., J.M. Crego, A. Allauzen, and F. Yvon. From n-gram-based to

crf-based translation models. in Proceedings of the Sixth Workshop on

Statistical Machine Translation. 2011. Association for Computational

Linguistics.

109. Druck, G., B. Settles, and A. McCallum. Active learning by labeling features.

in Proceedings of the 2009 Conference on Empirical Methods in Natural

Language Processing: Volume 1-Volume 1. 2009. Association for

Computational Linguistics.

110. Gupta, R., Conditional random fields. Unpublished report, IIT Bombay, 2006.

111. Khan, A., B. Baharudin, L.H. Lee, and K. Khan, A review of machine learning

algorithms for text-documents classification. Journal of advances in

information technology, 2010. 1(1): p. 4-20.

112. Mitra, V., C.J. Wang, and S. Banerjee, Text classification: A least square

support vector machine approach. Applied Soft Computing, 2007. 7(3): p. 908-

914.

113. Drucker, H., D. Wu, and V.N. Vapnik, Support vector machines for spam

categorization. IEEE Trans Neural Netw, 1999. 10(5): p. 1048-54.

114. Rouhani, M. and D.S. Javan, Two fast and accurate heuristic RBF learning

rules for data classification. Neural Netw, 2016. 75: p. 150-61.

Bibliography 169

115. Maulik, U. and D. Chakraborty, A self-trained ensemble with semisupervised

SVM: An application to pixel classification of remote sensing imagery. Pattern

Recognition, 2011. 44(3): p. 615-623.

116. Zhang, Y.H., J.H. Wen, X.B. Wang, and Z. Jiang, Semi-supervised learning

combining co-training with active learning. Expert Systems with Applications,

2014. 41(5): p. 2372-2378.

117. Hu, R., B. Mac Namee, and S.J. Delany, Active learning for text classification

with reusability. Expert Systems with Applications, 2016. 45: p. 438-449.

118. Silva, C. and B. Ribeiro, On text-based mining with active learning and

background knowledge using SVM. Soft Computing, 2007. 11(6): p. 519-530.

119. Leng, Y., X.Y. Xu, and G.H. Qi, Combining active learning and semi-

supervised learning to construct SVM classifier. Knowledge-Based Systems,

2013. 44: p. 121-131.

120. Wang, X.B., J.H. Wen, S. Alam, Z. Jiang, and Y.B. Wu, Semi-supervised

learning combining transductive support vector machine with active learning.

Neurocomputing, 2016. 173: p. 1288-1298.

121. Settles, B., Active learning literature survey. University of Wisconsin,

Madison, 2010. 52(55-66): p. 11.

122. Tong, S., Active learning: theory and applications. 2001, Citeseer.

123. Hajmohammadi, M.S., R. Ibrahim, A. Selamat, and H. Fujita, Combination of

active learning and self-training for cross-lingual sentiment classification with

density analysis of unlabelled samples. Information Sciences, 2015. 317: p. 67-

77.

124. Saito, P.T.M., P.J. de Rezende, A.X. Falcao, C.T.N. Suzuki, and J.F. Gomes,

An active learning paradigm based on a priori data reduction and organization.


125. Calma, A., J.M. Leimeister, P. Lukowicz, S. Oeste-Reiß, T. Reitmaier, A.

Schmidt, B. Sick, G. Stumme, and K.A. Zweig. From active learning to

dedicated collaborative interactive learning. in ARCS 2016; 29th International

Conference on Architecture of Computing Systems; Proceedings of. 2016.

VDE.

126. Cholette, M.E., P. Borghesani, E. Di Gialleonardo, and F. Braghin, Using

support vector machines for the computationally efficient identification of

170 Bibliography

acceptable design parameters in computer-aided engineering applications.

Expert Systems with Applications, 2017. 81: p. 39-52.

127. Vlachos, A., A stopping criterion for active learning. Computer Speech and

Language, 2008. 22(3): p. 295-312.

128. Kremer, J., K.S. Pedersen, and C. Igel, Active learning with support vector

machines. Wiley Interdisciplinary Reviews-Data Mining and Knowledge

Discovery, 2014. 4(4): p. 313-326.

129. Olsson, F., A literature survey of active machine learning in the context of

natural language processing. 2009.

130. Lewis, D.D. and W.A. Gale. A sequential algorithm for training text classifiers.

in Proceedings of the 17th annual international ACM SIGIR conference on

Research and development in information retrieval. 1994. Springer-Verlag

New York, Inc.

131. Fu, Y.F., X.Q. Zhu, and B. Li, A survey on instance selection for active

learning. Knowledge and Information Systems, 2013. 35(2): p. 249-283.

132. Angluin, D., Queries and concept learning. Machine Learning, 1988. 2(4): p.

319-342.

133. Moon, S., C. McCarter, and Y.-H. Kuo, Active learning with partially featured

data, in Proceedings of the 23rd International Conference on World Wide Web.

2014, ACM: Seoul, Korea. p. 1143-1148.

134. Li, L., X. Jin, S.J. Pan, and J.T. Sun. Multi-domain active learning for text

classification. in Proceedings of the 18th ACM SIGKDD international

conference on Knowledge discovery and data mining. 2012. ACM.

135. Novak, B., D. Mladenič, and M. Grobelnik, Text classification with active

learning, in From Data and Information Analysis to Knowledge Engineering.

2006, Springer. p. 398-405.

136. Brinker, K., Active learning with kernel machines. 2004, Citeseer.

137. Goudjil, M., M. Koudil, M. Bedda, and N. Ghoggali, A novel active learning

method using SVM for text classification. International Journal of Automation

and Computing: p. 1-9.

138. Zhu, X., Semi-supervised learning literature survey. 2005.

139. Pavlinek, M. and V. Podgorelec, Text classification method based on self-

training and LDA topic models. Expert Systems with Applications, 2017. 80:

p. 83-93.

Bibliography 171

140. Hodkiewicz, M., P.Kelly, J.Sikorska, and L.Gouws. A framework to assess

data quality for reliability variables. in 1st World Congress of Engineering

Asset Management. 2006. Gold Coast, Queensland Asutralia.

141. Agrawal, V., B.K. Panigrahi, and P.M.V. Subbarao, Review of control and

fault diagnosis methods applied to coal mills. Journal of Process Control,

2015. 32: p. 138-153.

142. Chauhan, M.K., Varun, S. Chaudhary, S. Kumar, and Samar, Life cycle

assessment of sugar industry: A review. Renewable & Sustainable Energy

Reviews, 2011. 15(7): p. 3445-3453.