258
ADAPTIVE IMAGE QUALITY IMPROVEMENT WITH BAYESIAN CLASSIFICATION FOR IN-LINE MONITORING By Shuo Yan A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Chemical Engineering and Applied Chemistry University of Toronto © Copyright by Shuo Yan 2008

ADAPTIVE IMAGE QUALITY IMPROVEMENT WITH … · adaptive image quality improvement with bayesian classification for in-line ... and of my very patient wife, ... 2.1 in-line image monitoring

  • Upload
    vananh

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

ADAPTIVE IMAGE QUALITY IMPROVEMENT

WITH BAYESIAN CLASSIFICATION FOR IN-LINE MONITORING

By

Shuo Yan

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Chemical Engineering and Applied Chemistry

University of Toronto

© Copyright by Shuo Yan 2008

ii

ADAPTIVE IMAGE QUALITY IMPROVEMENT

WITH BAYESIAN CLASSIFICATION

FOR IN-LINE MONITORING

Shuo Yan

Doctor of Philosophy

Department of Chemical Engineering and Applied Chemistry

University of Toronto

2008

ABSTRACT

Development of an automated method for classifying digital images using a combination

of image quality modification and Bayesian classification is the subject of this thesis.

The specific example is classification of images obtained by monitoring molten plastic in

an extruder. These images were to be classified into two groups: the “with particle”

(WP) group which showed contaminant particles and the “without particle” (WO) group

which did not. Previous work effected the classification using only an adaptive Bayesian

model. This work combines adaptive image quality modification with the adaptive

Bayesian model. The first objective was to develop an off-line automated method for

determining how to modify each individual raw image to obtain the quality required for

improved classification results. This was done in a very novel way by defining image

iii

quality in terms of probability using a Bayesian classification model. The Nelder Mead

Simplex method was then used to optimize the quality. The result was a “Reference

Image Database” which was used as a basis for accomplishing the second objective. The

second objective was to develop an in-line method for modifying the quality of new

images to improve classification over that which could be obtained previously. Case

Based Reasoning used the Reference Image Database to locate reference images similar

to each new image. The database supplied instructions on how to modify the new image

to obtain a better quality image. Experimental verification of the method used a variety

of images from the extruder monitor including images purposefully produced to be of

wide diversity. Image quality modification was made adaptive by adding new images to

the Reference Image Database. When combined with adaptive classification previously

employed, error rates decreased from about 10% to less than 1% for most images. For

one unusually difficult set of images that exhibited very low local contrast of particles in

the image against their background it was necessary to split the Reference Image

Database into two parts on the basis of a critical value for local contrast. The end result

of this work is a very powerful, flexible and general method for improving classification

of digital images that utilizes both image quality modification and classification

modeling.

iv

ACKNOWLEDGMENT I would like to express my gratitude to all those who made this research an enjoyable and

exciting journey.

First, my deepest gratitude goes to Professor Stephen T. Balke for his persistent guidance,

continuous encouragement, and patience during this research. His committed supervision

and unreserved support are greatly appreciated.

Next, I am very much indebted to Dr. Saed Sayad for his invaluable guidance and

participation in this project. In particular, I am very grateful for his excellent advice

regarding applying data mining techniques, as well as his assistance in improving the

programming involved in this research.

I would also like to acknowledge my Ph.D. committee members for their important

guidance and constructive comments throughout the research: Professor G. J. Evans,

Professor M. T. Kortschot, and Professor R. Mahadevan. In addition, I thank my fellow

students who helped me with my research: Dr. Keivan Torabi, Ms. Forouz Farahani, and

all of my other friends in the Department of Chemical Engineering and Applied

Chemistry.

Finally, I would like to express my deepest gratitude for the constant encouragement and

support of my parents, Guangguo Tan and Xiufang Wei, and of my very patient wife,

Yingjing Fu, during these many years of intensive work.

v

TABLE OF CONTENTS ABSTRACT ................................................................................................................................................. II ACKNOWLEDGMENT............................................................................................................................ IV TABLE OF CONTENTS.............................................................................................................................V LIST OF TABLES.................................................................................................................................... VII LIST OF FIGURES......................................................................................................................................X NOMENCLATURE ................................................................................................................................XIII 1 INTRODUCTION............................................................................................................................... 1 2 LITERATURE REVIEW AND STRATEGY DEVELOPMENT.................................................. 4

2.1 IN-LINE IMAGE MONITORING ....................................................................................................... 4 2.2 CLASSIFICATION METHODS.......................................................................................................... 6

2.2.1 Overview of Classification...................................................................................................... 6 2.2.2 The Torabi Bayesian Classification Model............................................................................. 7

2.2.2.1 Thresholding ................................................................................................................................ 7 2.2.2.2 Bayesian Classification ................................................................................................................ 9

2.3 OFF-LINE IMAGE QUALITY MODIFICATION ................................................................................ 18 2.3.1 Proposed Strategy for Accomplishing the First Objective.................................................... 18 2.3.2 Defining Image Quality: Image Quality Metrics ................................................................. 19 2.3.3 Image Quality Operators (IQ Operators)............................................................................. 22 2.3.4 Optimizing Image Quality..................................................................................................... 24

2.4 IN-LINE IMAGE QUALITY MODIFICATION................................................................................... 25 2.4.1 Proposed Strategy for Accomplishing the Second Objective................................................ 25 2.4.2 Use of the Reference Image Database: Case-based Reasoning ........................................... 26

2.4.2.1 Advantages of Case-Based Reasoning ....................................................................................... 28 2.4.2.2 Case-Based Reasoning in Image Interpretation.......................................................................... 29 2.4.2.3 The Case-Based Reasoning Process........................................................................................... 30 2.4.2.4 Case Retrieval in Image Interpretation....................................................................................... 33 2.4.2.5 Case Retention in Case Based Reasoning .................................................................................. 36 2.4.2.6 Computation Efficiency of Case-Based Reasoning.................................................................... 37

2.5 EVALUATION OF CLASSIFICATION METHODS............................................................................. 39 3 EXPERIMENTAL PROCEDURE.................................................................................................. 46 4 COMPUTATIONAL PROCEDURE.............................................................................................. 49

4.1 SOFTWARE DEVELOPMENT FOR THE FIRST OBJECTIVE: OFF-LINE IMAGE QUALITY MODIFICATION ......................................................................................................................................... 49

4.1.1 The Simplex Optimization Component ................................................................................. 51 4.1.2 The Image Processing Shared Component ........................................................................... 53 4.1.3 The Image Measurement Shared Component ....................................................................... 54 4.1.4 The Image Thresholding Shared Component ....................................................................... 56 4.1.5 The Image Classification Shared Component....................................................................... 57 4.1.6 The Database Shared Component ........................................................................................ 58

4.2 SOFTWARE DEVELOPMENT FOR THE SECOND OBJECTIVE: IN-LINE IMAGE QUALITY MODIFICATION ......................................................................................................................................... 61

5 RESULTS AND DISCUSSION ....................................................................................................... 65 5.1 OFF-LINE IMAGE MODIFICATION................................................................................................ 65

5.1.1 Selection of Image Quality Operators and their Order of Application................................. 69 5.1.1.1 Screen 1: Constraining Selection of IQ Operators by Selecting the Image Analysis Software . 69 5.1.1.2 Screen 2: Selection of IQ Operators by Image Characteristics................................................. 71 5.1.1.3 The Application of Screen 2 to Images Obtained by the Scanning Particle Monitor ................. 75 5.1.1.4 Screen 3: Dimensionality Reduction: Selection of IQ Operators by Task Specific Criteria...... 78

vi

5.1.2 Image Quality Definition ...................................................................................................... 80 5.1.2.1 Least Squares as Objective Function.......................................................................................... 81 5.1.2.2 Weighted Least Squares (WLS) as Objective Function ............................................................. 89 5.1.2.3 The Desirability Function as Objective Function....................................................................... 96 5.1.2.4 Probability Density Difference as Objective Function............................................................. 100

5.1.3 Comparison of Classification Results for Different Objective Functions........................... 108 5.2 IN-LINE IMAGE QUALITY MODIFICATION................................................................................. 113

5.2.1 The Reference Image Database .......................................................................................... 113 5.2.2 In-line Image Quality Modification for Classification: Use of a Static Classification Model 116 5.2.3 In-line Adaptive Image Quality Modification with Adaptive Classification ....................... 122

5.2.3.1 Test Trial 1: the Use of a New Set of Images Produced by Torabi ......................................... 124 5.2.3.2 Test Trial 2: the Use of Microgel Image Set Produced by Ing ................................................ 126 5.2.3.3 Test Trial 3: the Use of Images from New Extruder Runs Utilizing Injection of Particles with Low Additive Polyethylene Pelletized Feed ................................................................................................. 129 5.2.3.4 Test Trial 4: the Use of Images from New Extrusion Runs Utilizing Injection of Particles with High Additive Polyethylene Pelletized Feed................................................................................................. 131 5.2.3.5 The Application of Decision Rule in Case-based Reasoning (CBR)........................................ 134 5.2.3.6 Classification Results after the Application of Decision Rule in Case-Based Reasoning ........ 136

5.2.4 Summary of the Method Developed for the Second Objective............................................ 140 6 CONCLUSIONS ............................................................................................................................. 142 7 RECOMENDATIONS ................................................................................................................... 144 8 APPENDICES................................................................................................................................. 145

APPENDIX I AN OVERVIEW OF OBJECTIVE IMAGE QUALITY METRICS (IQ METRICS)........................ 145 APPENDIX II IMAGE QUALITY OPERATORS .................................................................................... 159

Radiometric Operators .................................................................................................................................. 159 Arithmetic-based operations ......................................................................................................................... 167 Geometric Operators ..................................................................................................................................... 167 Mathematical Morphological Operators........................................................................................................ 176 Non-uniform Illumination Correction ........................................................................................................... 179

APPENDIX III THE NELDER-MEAD SIMPLEX METHOD..................................................................... 180 Basic Simplex Method .................................................................................................................................. 180 Modified Simplex Method (Nelder-Mead) ................................................................................................... 182 Transformation of Constraints in the Utilization of Simplex Optimization................................................... 184 Simplex Optimization Stopping Criteria ....................................................................................................... 185 Objective Function........................................................................................................................................ 186 Other Considerations on the Utilization of Simplex Method ........................................................................ 187

APPENDIX IV MODIFIED MAXMIN THRESHOLDING......................................................................... 188 APPENDIX V ADAPTIVE MACHINE LEARNING METHODS .................................................. 197

The Intelligent Learning Machine (ILM) ...................................................................................................... 197 Incremental Support Vector Machine (ISVM).............................................................................................. 203 Incremental Neural Networks (INN)............................................................................................................. 205

APPENDIX VI SCREEN 3: SELECTION OF IQ OPERATORS BY TASK SPECIFIC CRITERIA ................... 208 Screen 3: Trial 1............................................................................................................................................ 209 Screen 3: Trial 2............................................................................................................................................ 214 Screen 3: Trial 3............................................................................................................................................ 215 Screen 3: Trial 4............................................................................................................................................ 217

APPENDIX VII CASE-BASED REASONING CLASSIFICATION............................................................... 219 APPENDIX VIII COMPUTATION TIME FOR DIFFERENT CLASSIFICATION METHODS........................ 228 APPENDIX IX STATISTICS ON THE ESTIMATION OF PROPORTIONS.................................................... 229

9 REFERENCES................................................................................................................................ 235

vii

LIST OF TABLES Table 2-1 Confusion Matrix for a Binary Classification Problem.................................... 40 Table 2-2 Confusion Matrix for a Binary Classification Problem Showing Data Mining Measures ........................................................................................................................... 40 Table 2-3 Definition of Terms Related to Classification.................................................. 41 Table 3-1 Summary of Image Data Sets Used in This Work ........................................... 47 Table 3-2 Image Data Sets Produced from Experimental Extrusion Runs in this Research........................................................................................................................................... 48 Table 5-1 Image Quality Operators .................................................................................. 71 Table 5-2 Classification Confusion Matrix for the Training Set of Raw Images............. 83 Table 5-3 Classification Confusion Matrix for the Training Set of Images Optimized Using the Least Squares Objective Function .................................................................... 83 Table 5-4 Relationship between Least Squares as Objection Function and Classification Accuracy ........................................................................................................................... 87 Table 5-5 Image Quality Distribution for Least Squares Optimized Images ................... 88 Table 5-6 Weight Factors for Weighted Least Squares Image Quality Definition........... 90 Table 5-7 Confusion Matrix for Weighted Least Squares Optimized Images.................. 91 Table 5-8 Comparison of AUC for Least Squares and Weighted Least Squares Optimized Images ............................................................................................................................... 93 Table 5-9 Relationship between Weighted Least Squares as Objection Function and Classification Accuracy .................................................................................................... 94 Table 5-10 Image Quality Distribution for Weight Least Squares Optimized Images..... 95 Table 5-11 Confusion Matrix for Desirability Function Optimized Images .................... 97 Table 5-12 Image Quality Distribution of Desirability Function Optimized Images....... 99 Table 5-13 Parameter Values in Classification Models.................................................. 105 Table 5-14 Confusion Matrix for Training Image Set Using Probability Density Difference as Objective Function ................................................................................... 106 Table 5-15 Comparison of AUC among Different Objective Functions........................ 111 Table 5-16 Image Quality Metrics for the Training Image Set After Blanket Processing......................................................................................................................................... 111 Table 5-17 A Portion of Reference Image Database ...................................................... 114 Table 5-18 Similarity Attributes ..................................................................................... 117 Table 5-19 Confusion Matrix for a Subset of Image Set 1 Using IQMod Classification120 Table 5-20 Confusion Matrix for a Subset of Image Set 1 Using Bayesian Classification......................................................................................................................................... 121 Table 5-21 Confusion Matrix of Test Trial 1 Image Set Using In-line Adaptive Bayesian Classification................................................................................................................... 124 Table 5-22 Confusion Matrix of Test Trial 1 Image Set Using Static IQMod Classification................................................................................................................... 124 Table 5-23 Confusion Matrix of Test Trial 1 Image Set Using Adaptive IQMod Classification................................................................................................................... 125 Table 5-24 Image Quality Metrics of Test Trial 2 Microgel Image Set ......................... 127

viii

Table 5-25 Confusion Matrix of Test Trial 2 Image Set Using In-line Adaptive Classification................................................................................................................... 127 Table 5-26 Confusion Matrix of Test Trial 2 Image Set Using Static IQMod Classification................................................................................................................... 127 Table 5-27 Confusion Matrix of Test Trial 2 Image Set Using Adaptive IQMod Classification................................................................................................................... 127 Table 5-28 Image Quality Metrics of Test Trial 3 Image Set......................................... 129 Table 5-29 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 in Table 3-2) Using Adaptive IQMod Classification ...................................................................................... 130 Table 5-30 Confusion Matrix for Test Trial 3 Image Subset (Run 3-2 in Table 3-2) Using Adaptive IQMod Classification ...................................................................................... 130 Table 5-31 Confusion Matrix for Test Trial 3 Image Subset (Run 3-3 in Table 3-2) Using Adaptive IQMod Classification ...................................................................................... 130 Table 5-32 Confusion Matrix for Trial 3 Image Subset (Run 3-4 in Table 3-2) Using Adaptive IQMod Classification ...................................................................................... 130 Table 5-33 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-5 in Table 3-2) Using Adaptive IQMod Classification............................................................................ 132 Table 5-34 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Adaptive IQMod Classification............................................................................ 132 Table 5-35 Image Quality Metrics for Test Trial 4 Image Set ....................................... 133 Table 5-36 The Effect of Local Contrast Threshold on Classification Error Rate for Test Trial 4.............................................................................................................................. 137 Table 5-37 Confusion Matrix of Test Trial 4 Using Adaptive IQMod Classification with Decision Rule.................................................................................................................. 139 Table 5-38 Confusion Matrix of Test Trial 4 Using In-line Adaptive Bayesian Classification................................................................................................................... 139 Table 8-1 Two-way ANOVA Table for Quantifying the Illumination Uniformity........ 156 Table 8-2 Comparison of Illumination Uniformity for Raw Images and Their Illumination Corrected Images ............................................................................................................ 158 Table 8-3 Threshold Test of Modified MaxMin Thresholding ...................................... 194 Table 8-4 Classification Confusion Matrix for MaxMin Thresholding.......................... 195 Table 8-5 Classification Confusion Matrix for Modified MaxMin Thresholding.......... 195 Table 8-6 The General Structure of ILM Weight Table ................................................. 198 Table 8-7 The Basic Unit of the ILM Weight Table ...................................................... 198 Table 8-8 ILM Knowledge Table for N images.............................................................. 201 Table 8-9 Statistical Experimental Designs.................................................................... 209 Table 8-10 Two Level Factorial Design for an Image Operator Sequence .................... 211 Table 8-11 Confusion Matrix of Test Trial 1 Image Set Using CBR Classification...... 221 Table 8-12 Confusion Matrix of Test Trial 2 Image Set Using CBR Classification...... 222 Table 8-13 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using Torabi Adaptive Classification ..................................................................... 224 Table 8-14 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using CBR Classification ....................................................................................... 224 Table 8-15 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using Static IQMod Classification ......................................................................... 224

ix

Table 8-16 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using Adaptive IQMod Classification.................................................................... 224 Table 8-17 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Torabi Adaptive Classification............................................................................. 226 Table 8-18 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using CBR Classification ............................................................................................... 226 Table 8-19 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Static IQMod Classification ................................................................................. 226 Table 8-20 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Adaptive IQMod Classification............................................................................ 226 Table 8-21 Computation Time for Different Classification Methods ............................ 228 Table 8-22 Errors from 10-Fold Cross Validation for Classification Models Created Based on Images Optimized Using Different Image Quality Definitions ...................... 229 Table 8-23 A Sample Classification Confusion Matrix.................................................. 232

x

LIST OF FIGURES Figure 2-1 Torabi Bayesian Classification Method .......................................................... 10 Figure 2-2 Mircogel image with no additive in the background ...................................... 14 Figure 2-3 Glass Microsphere (GMS) Images with talc additive in the background ...... 15 Figure 2-4 Incremental Adaptation of Bayesian Classification Model ............................ 16 Figure 2-5 Case-Based Reasoning Process....................................................................... 32 Figure 2-6 Example of a Receiver Operating Characteristic (ROC) Curve ..................... 44 Figure 3-1 In-Line Image Monitoring System for Plastics Film Extrusion...................... 46 Figure 4-1 Computational Software Components Associated with Off-line Image Quality Modification...................................................................................................................... 50 Figure 4-2 Simplex Optimization Component in Off-line Image Quality Modification. 52 Figure 4-3 Image Processing Shared Component............................................................. 53 Figure 4-4 Image Measurement Shared Component ........................................................ 55 Figure 4-5 Image Thresholding Shared Component......................................................... 57 Figure 4-6 Functionalities of Image Classification Shared Components ......................... 59 Figure 4-7 Reference Image Database Shared Components............................................. 60 Figure 4-8 Computational Software Components Associated with In-line Image Quality Modification...................................................................................................................... 62 Figure 4-9 Case-based Reasoning Component in In-line Image Quality Modification ... 64 Figure 5-1 Off-line Image Quality Modification Framework........................................... 68 Figure 5-2 Layered Screening of IQ Operators ............................................................... 70 Figure 5-3 Pre-selection of Image Operators for Noise Removal .................................... 73 Figure 5-4 Pre-selection of Image Operators for Blur Removal....................................... 73 Figure 5-5 Pre-selection of Image Operators for Contrast Enhancement......................... 73 Figure 5-6 Pre-selection of Image Operators for Illumination Correction ....................... 74 Figure 5-7 Pre-selection of Image Operators for Brightness Adjustment ........................ 74 Figure 5-8 Example Real Image 1 .................................................................................... 76 Figure 5-9 Example Real Image 2 .................................................................................... 77 Figure 5-10 Classification Error Rate for Least Squares as Objective Function.............. 84 Figure 5-11 ROC Curve for Least Squares as Objective Function................................... 85 Figure 5-12 Image Quality Distribution for Least Squares Optimized Images ................ 88 Figure 5-13 Image Quality Histogram for Least Squares Optimized Images .................. 89 Figure 5-14 Classification Error Rates for Weighted Least Squares Optimized Images . 92 Figure 5-15 ROC Curve for Weighted Least Squares Optimized Images........................ 93 Figure 5-16 Image Quality Distribution for Weighted Least Squares Optimized Images 95 Figure 5-17 Image Quality Histogram for Weighted Least Squares Optimized Images.. 96 Figure 5-18 Classification Error Rate for Desirability Function Optimized Images........ 98 Figure 5-19 ROC Curve for Desirability Function Optimized Images............................. 98 Figure 5-20 Image Quality Distribution of Desirability Function Optimized Images...... 99 Figure 5-21 Image Quality Histogram for Desirability Function Optimized Images..... 100 Figure 5-22 Classification Error Rate for Probability Density Difference Optimized Images ............................................................................................................................. 107 Figure 5-23 ROC Curve for Probability Density Difference Optimized Images ........... 108

xi

Figure 5-24 Comparison of Classification Error Rates among Different Objective Functions......................................................................................................................... 109 Figure 5-25 Comparison of False Negative Rates among Different Objective Functions......................................................................................................................................... 110 Figure 5-26 Comparison of False Positive Rates among Different Objective Functions110 Figure 5-27 ROC Analysis of the Classification Performance of Different Objective Functions......................................................................................................................... 111 Figure 5-28 In-line Image Quality Modification Framework........................................ 119 Figure 5-29 Comparison of Classification Error Rates between Bayesian Classification and IQMod Classification ............................................................................................... 121 Figure 5-30 Comparison of Classification Error Rates for Test Trial 1 among Different Models............................................................................................................................. 126 Figure 5-31 Comparison of Classification Error Rates for Test Trial 2 among Different Models............................................................................................................................. 128 Figure 5-32 Classification Error Rates for Test Trial 3 Using Adaptive IQMod Classification................................................................................................................... 131 Figure 5-33 Classification Results for Test Trial 4 Image Set Using Adaptive IQMod Classification................................................................................................................... 133 Figure 5-34 The Application of Decision Rule in Case-Based Reasoning..................... 135 Figure 5-35 The Effect of Local Contrast Threshold on Classification Error Rate for Test Trial 4.............................................................................................................................. 137 Figure 5-36 The Effect of Local Contrast Threshold on Classification Accuracy for Test Trial 4.............................................................................................................................. 138 Figure 5-37 Comparison for Classification Error Rates for Test Trial 4 among Different Models............................................................................................................................. 139 Figure 8-1 Quantification of Illumination Uniformity Using ANOVA Analysis........... 155 Figure 8-2 Examples of Kernels ..................................................................................... 174 Figure 8-3 Normalized Gaussian kernel with σ =1.4...................................................... 175 Figure 8-4 Schematic Diagram of Basic Simplex Method ............................................. 183 Figure 8-5 Schematic Diagram of Modified Simplex Method ....................................... 183 Figure 8-6 Flow Chart of Modified Simplex Algorithm ................................................ 186 Figure 8-7 An Example Image........................................................................................ 191 Figure 8-8 The Effect of Thresholding Step Size on Minimum Particle Size ................ 191 Figure 8-9 Scheme of Modified Search for Threshold ................................................... 193 Figure 8-10 Comparison of Classification Error Rates for two Different Thresholding Methods........................................................................................................................... 196 Figure 8-11 ROC Curves for MaxMin Thresholding and Modified MaxMin Thresholding......................................................................................................................................... 196 Figure 8-12 The Intelligent Learning Machine (ILM).................................................... 197 Figure 8-13 Test Pattern Image 1................................................................................... 210 Figure 8-14 Graphical Illustration of the Execution of Two-Level Factorial Design on a Test Pattern ..................................................................................................................... 210 Figure 8-15 Plot of Two-Way Interaction Effect of Grayscale Erosion and Grayscale Dilation ........................................................................................................................... 212 Figure 8-16 Color-coded Scatterplot Matrix for Sequence of Grayscale dilation, Grayscale erosion and Median filter ............................................................................... 214

xii

Figure 8-17 Test Pattern Image 2.................................................................................... 215 Figure 8-18 Color-coded Scatterplot Matrix for Sequence of Median and Gaussian Filters......................................................................................................................................... 215 Figure 8-19 Test Pattern Image 3.................................................................................... 216 Figure 8-20 Color-coded Scatterplot Matrix for Sequence of Brightness shift, Median and Gaussian Filters............................................................................................................... 216 Figure 8-21 Test Pattern Image 4.................................................................................... 217 Figure 8-22 Color-coded Scatterplot Matrix for the Sequence of Brightness shift, Median filter, Gaussian Filter and the Unsharp Mask ................................................................. 218 Figure 8-23 The Schema of Case-Based Reasoning Classification................................ 220 Figure 8-24 The Case-Based Reasoning Classification Component .............................. 221 Figure 8-25 Comparison of Classification Error Rates for Test Trial 1 Using Different Classification Methods.................................................................................................... 222 Figure 8-26 Comparison of Classification Error Rates for Test Trial 2 Using Different Classification Methods.................................................................................................... 223 Figure 8-27 Classification Error Rates for Test Trial 3 Using Different Classification Methods........................................................................................................................... 225 Figure 8-28 Classification Results for Test Trial 4 Image Set Using Different Classification Methods.................................................................................................... 227 Figure 8-29 Comparison of Confidence Intervals .......................................................... 231

xiii

NOMENCLATURE SCALARS Ai,j the area of the ith particle visible in an image using threshold j CiA the ith attribute of image A CiB the ith attribute of image B Cimin the minimum value of ith attribute of an image Cimax the maximum value of ith attribute of an image D similarity measurement between a new acquire image and an

image in Reference Image Database Si the value of ith similarity attribute for a new image SDB,I the value of ith similarity attribute for an image in Reference Image

Database distAB distance between image A and B f(X=x|C=Y) the probability per dx increment and is termed as the probability

density, and it is the probability density of attribute vector X with value x given an image belonging in class Y.

f(X=x|C=WO) the probability density of attribute vector X with value x given an image belonging in class WO (without particle)

f(Xi=xij|C=WO) the probability density of ith attribute Xi with value xij given an image belonging in class WO (without particle) and xij is the jth value of attribute Xi.

f(X=x|C=WP) the probability density of attribute vector X with value x given an image belonging in class WP (with particle)

f(Xi=xij|C=WP) the probability density of ith attribute Xi with value xij given an image belonging in class WP (with particle), and xij is the jth value of attribute Xi.

FN number of positives incorrectly classified as negatives FP number of negatives incorrectly classified as positives GB mean grey level of the immediate background of particles in an

image Gp mean grey level of the particles in an image Lmax maximum grey level of an image Lmin minimum grey level of an image N total number of negative cases (number of image labeled by human

observer as “Without Particles” P the number of positive cases (number of image labeled by human

observer as “With Particles” P(WP) the priori probability of an image being classified as WP (with

particle) P(WO) the priori probability of an image being classified as WO (without

particle) P(C=A) the prior probability of an event belonging to class A

xiv

P(C=A|attributes) the posterior probability of an even belonging to class A given these attributes

P(attributes|A) the posterior probability of these attributes for class A P(attributes) the probability of these attribute vector P(Atti|A) the posterior probability of the ith attribute for class A P(C=WO) the prior probability of an image belonging to class WO (without

particle) P(C=WO|X=x) the posterior probability of an image belong to class WO (without

particle) given attribute vector X with value x P(C=WP) the prior probability of an image belonging to class WP (with

particle) P(C=WP|X=x) the posterior probability of an image belong to class WP (with

particle) given attribute vector X with value x P(X=x) the probability of attribute vector X with value x. P(X=x|C=WO) the probability of attribute vector X with value x given an image

belonging in class WO (without particle) P(Xi=xij|C=WO) the probability of ith attribute Xi with value xi given an image

belonging in class WO (without particle) and xij is the jth value of attribute Xi.

P(X=x|C=WP) the probability of attribute vector X with value x given an image belonging in class WP (with particle)

P(Xi=xij|C=WP) the probability of ith attribute Xi with value xi given an image belonging in class WP (with particle) and xij is the jth value of attribute Xi.

Qi the value of ith Image Quality metric Qi,d the desired value of ith Image Quality metric Qi,r the value of ith Image Quality metric of raw image Qi,LS the desired value of ith Image Quality metric of Least Squares

optimized images T threshold value TP number of positives correctly classified as positives TN number of negatives correctly classified as negatives Wi the weighting factor of ith Image Quality Metric Xi the ith attribute of attribute vector X xij the jth value of attribute Xi

GREEK LETTERS µi the average of the ith attribute Xi σi the standard deviation of the ith attribute Xi ABBREVIATIONS AI Artificial Intelligence AUC Area Under the Receiver Operating Characteristic Curve

xv

BR Brightness Linear Shift CBR Case-based Reasoning CON Contrast Stretch EQL Histogram Equalization GB Gaussian Blur GDIL Grayscale dilation GET Grayscale erosion IIIS Intelligent Image Interpretation System ILM Intelligent Learning Machine IQ Image Quality IQMod Classification In-line Image Quality Modification for Classification ISVM Incremental Support Vector Machine INN Incremental Neural Network IWT Intelligent Learning Machine Weight Table KT Knowledge Table LOOCV Leave One Out Cross Validation LS Least Squares MD Median Filter MN Mean Filter ROC Receiver Operating Characteristic SHP Sharpen Operator SPM Scanning Particle Monitor SUM Subtract Background UNSHP Unsharp Mask WLS Weighted Least Squares WO Without Particle WP With Particle

1

1 INTRODUCTION Digital images often contain large amounts of very useful information. However,

hundreds, or even thousands of such images are produced by automated camera systems.

Also, even when only a few images are to be examined, objective and rapid analysis is

often desired. Thus, methods to enable a computer to automatically and rapidly extract

required information from images are needed. This thesis focuses on the problem of

automatically classifying images obtained from monitoring molten plastic in an extruder

into two groups: those images that show at least one undesirable contaminant particle to

be present in the image (i.e. “With Particle” (WP) images) and those that do not

(“Without particle” (WO) images). This is a very important problem in the plastics

industry because such particles in the melt can cause holes and other defects in the plastic

film produced. A significant complication is variable image quality due to changes in the

extrusion process or feed material. A previous attempt to accomplish such automated

classification by Torabi [1, 2] utilized an adaptive classification model approach and was

quite successful: about 90% of the images examined could be correctly classified.

However, even a 10% error in classification would often be prohibitively large in

controlling extruders. This led to the idea of improving the performance of Torabi’s

adaptive classification model by improving the quality of the image previous to

classification.

The hypothesis underlying this work was:

2

Adaptive, real-time, image quality improvement is now practical by using adaptive

machine learning methods and will significantly improve automated image classification

accuracy and robustness.

Also, it was realized from the outset that, although the focus of the thesis was on

detecting particles in images from an extruder monitor, development of a method that

combined adaptive image quality improvement with adaptive classification modeling

could be advantageously applied to many other situations. This would be especially so

the more flexible the method developed. So, the work was directed at obtaining a

solution as generic as possible to the combination problem.

Considering the above motivations for the work the following two objectives were

defined:

1. To develop an off-line automated method for determining how to modify

each individual raw image to the image quality required for improved

classification results.

The raw images to be used are those where the presence or absence of one or

more particles in the image is already known to the software. These images are

the reference images to be used in the “image database”. In in-line analysis the

image in this database which most closely resembles a new image where the

presence or absence of particles is unknown to the software is used to provide the

needed image improvement information.

3

2. To develop an in-line method for modifying the quality of acquired images to

permit improved classification by using the results of the first objective as a

database.

In this case the software does not know a priori whether or not an image shows a

particle. The quality of each image must be individually modified so as to

improve the software’s ability to determine the correct class (WO or WP). In

comparison with previous methods that do not involve such customized image

quality modification, classification may be improved by showing superior

accuracy and/or superior adaptability to variations in raw image quality.

4

2 LITERATURE REVIEW AND STRATEGY DEVELOPMENT

2.1 In-line Image Monitoring In-line monitoring has many advantages over offline inspection: elimination of

significant time lags, more comprehensive sampling and enabling of automatic process

control. Monitoring images appears particularly attractive because of the information

content in an image. Digital imaging for in-line monitoring applications [3-18] has

therefore recently become popular due to the availability of inexpensive sensors and

increased computer power. In-line image monitoring is currently applied in various

industries, from chemical unit operations such as polymer extrusion [7, 8, 13] to

biological processes such as cell growth and fermentation [9, 17], to electronic

manufacturing of printed circuit board [6, 10] and wafer manufacture [16].

The focus of this thesis is image processing for in-line monitoring. Most in-line image

monitoring systems are used for pattern detection. However, in-line image monitoring

systems differ in many ways. From the image processing standpoint, there are two kinds

of systems. One type of system [6, 9, 10, 13-17] involves a great deal of low-level image

processing techniques for image enhancement such as de-noising before image

interpretation. However, some systems [4, 5, 7, 8, 11, 12, 18] do not have the image

enhancement issues but rather the extraction of information from the images using data

mining techniques is the emphasis, with image quality being reasonably constant because

of the nature of the process being monitored.

5

A real-time defect inspection system for textured surface was developed by Baykut [3].

In his system, low-level image processing was trivial. However data mining techniques

based on a Markov Random Field model played the most important role in automatic

inspection of surfaces. This is also true for monitoring systems developed by Bharati [4,

5] and Yu [18]. In their systems, multivariate principle component analysis was used to

detect patterns of interest. All of these systems are descriptive in the sense that only

qualitative pattern information is extracted.

An in-situ microscope system was developed by Joeris et al. [9] to acquire images of

mammalian cells directly inside a bioreactor during a fermentation process. Process

relevant quantitative measures such as cell density were extracted from the images by

digital image processing procedures. Watano et al. [15] developed an image system to

continuously monitor granule growth in a high sheer granulation. Granule size and

distribution were continuously measured. These systems obtained quantitative

measurements from the processes and low-level image processing was important. Unlike

monitoring systems for pattern detection, these systems did not use any high level data

mining techniques.

Early work in Professor Balke’s group at the University of Toronto involved use of fiber

optic assisted cameras to monitor recycled plastic waste during extrusion. Further

developments led to elimination of the fiber optics by inserting a window in the wall of

the extruder and direct camera monitoring through the window. Eventually, a specialized

camera, termed the “Scanning Particle Monitor” was developed [19]. It enabled particles

to be monitored in the polymer melt at different distances from the extruder wall. In his

6

Ph.D. research, Torabi [13] applied various data mining techniques to interpret the

images in-line and in real time. He developed and applied a particular data mining

method: adaptive Bayesian classification to classify images into those with particles and

those without. The work reported here uses Torabi’s model as a base. Therefore,

following a brief general introduction to the subject of classification, Torabi’s work will

be summarized in the following sections.

2.2 Classification Methods

2.2.1 Overview of Classification Machine learning methods are to be used to customize the selection and application of IQ

Operators to individual images being acquired in-line. Machine learning is directed at

four primary tasks: supervised learning, unsupervised learning, reinforcement learning

and rule learning. Supervised learning is of primary interest here. The goal of supervised

learning is to predict outputs on future inputs given samples of inputs and corresponding

desired outputs. There are three commonly used supervised learning methods:

regression, classification and time series. Regression and classification are most relevant

here.

Classification methods are normally considered as batch machine learning methods. In

this work they need to adapt to changes in image quality. That is, they need to be able to

accept and use new data to update the models immediately without extensive

recalibration using all of the data (old and new) at once. Such “incremental machine

learning” has attracted tremendous attention in the past decade [20-32]. Another

weakness of classical “batch” machine learning systems is their lack of stability and

7

plasticity. When new data comes in, batch learning methods are often unable to

accommodate the new data, demonstrating a lack of plasticity; or the predictive

performance is poor with high error rate, displaying a lack of stability. These weaknesses,

however, can be overcome by adaptive machine learning methods with its incremental,

in-line and real-time characteristics. Of these methods, the Intelligent Learning Machine

(ILM) is by far the most promising because of its power, flexibility and ease of

implementation. We have the advantage of having the inventor of this method as a co-

supervisor of the work (Saed Sayad).

Torabi utilized the Bayesian Classification method with the ILM to create an adaptive

Bayesian classification model. This will be more fully described below in Section 2.2.2.2.

As can be seen from the proposed strategy in Section 2.4, in this work adaptive image

quality modification will be combined with this adaptive classification model.

2.2.2 The Torabi Bayesian Classification Model

Two major contributions of Torabi’s research were the development of a novel

thresholding method termed “adaptive MaxMin thresholding” and the application of

Bayesian model for classification in an adaptive form using the Intelligent Learning

Machine (ILM). These topics are described in turn below.

2.2.2.1 Thresholding There are many image thresholding techniques available. However, these techniques are

suitable for different images, and they were found not performing well on fine

contaminant particle images with variable quality from polymer melt monitoring in a

real-time mode. Therefore in Torabi’s research, MaxMin thresholding was developed to

8

meet real-time particle image thresholding need. The method notes the size of the

smallest detected particle in an image as threshold value is progressively changed from

black to white. The selected threshold value is the one providing the largest size [13]. The

method was shown to have the capacity to adapt to image of different background noise

levels and provided particle counts as accurate as those of a human observer in less than 3

seconds per image. In addition, error in particle size measurement was within 3% for 50

micro particles, using a CCD camera with 2 × lens. The margin of error is considered

very small and acceptable.

The method is computationally already efficient enough in comparison with the results of

other techniques including histogram thresholding and attributed-based thresholding

attempted in Torabi’s research. The mathematical expression of the MaxMin thresholding

is shown in Equation 2-1.

))(( ,:1:0 jinikjAMinMaxT

=== 2-1

where T is the selected threshold value, Ai,j is the area of the ith particle visible in the

image using the jth value of threshold. For each jth value of threshold, the Minimum

particle size is found. The threshold T would be the value within [0,k] giving the

maximum minimum particle size. The k is set to 220 in Torabi’s research.

However, MaxMin thresholding still faces high computational cost. It needs 3 seconds to

threshold one image excluding time for other image processing and interpretation tasks.

For real-time imaging monitoring system, this speed is still considered slow. The reason

for the relative long thresholding time is because it requires 220 iterations to finish the

MaxMin thresholding. In this research, three major modifications to the MaxMin

9

thresholding were made. The first modification involves the constraint of starting and

ending value of thresholding in the above equation. The starting value of thresholding is

the minimum grey value of an image, and the ending value of threshold is chosen to be

the median grey value of the image. The second modification increased the step size of

the iteration which could reduce the search time. The last modification is based on an

assumption that the first two peaks in a plot of minimum particle size versus threshold as

shown in Figure 8-8 in Appendix IV would be where the threshold is located. Based on

this assumption, a modified search as depicted in Appendix IV is carried out to find the

threshold which gives the maximum minimum particle size. It is found out that the

assumption is valid and these modifications are reliable and greatly reduce the

thresholding time to less than 1 second. The results of the modified MaxMin thresholding

are very consistent with the prior-modified MaxMin thresholding. The details of these

modifications, their effectiveness in terms of improved classification accuracy over the

previous MaxMin thresholding method are explained in Appendix IV.

2.2.2.2 Bayesian Classification In Torabi’s research, adaptive classification model of Bayesian with the application of

Intelligent Learning Machine (ILM) by Sayad [20] was developed. The model is used to

classify particle images captured from inside of molten plastic extruder using Scanning

Particle Monitors (SPM). The images belong to two categories: With Particle (WP) or

Without Particle (WO). The adaptive model was demonstrated to adapt to changing

image quality and achieved desirable classification results. In this section, the

development of the Bayesian classification model and its integration with the Intelligent

10

Learning Machine will be introduced. The details on the ILM are explained in Appendix

V.

The creation of a Bayesian model from input training images in Torabi’s research is

illustrated in Figure 2-1. The input training image was first pre-processed with brightness

adjustment and background flattening image operators. Relevant features for creating

classification models were then extracted from the pre-processed image. These features

from all training images were then used to create the Bayesian classification model using

the ILM method.

Figure 2-1 Torabi Bayesian Classification Method

11

The Bayesian method is a well-known probabilistic model to calculate the probability of

an event belonging to a particular class given the attributes X of the event as expressed in

Equation 2-2:

)()|()()|(

xXPACxXPACPxXACP

====

=== 2-2

where A is the class label, P(C=A) is the prior probability of an event belonging to class

A. X is attribute vector. X=x means that attribute vector X takes value of x. P(X=x) is the

probability of attribute vector X taking value x. P(X=x|C=A) is the posterior probability

of X with value of x for class A. The event will be classified to a class which gives the

highest probability. If all the attributes used are statistically independent, then the above

equation is reduced to the “Naïve Bayesian” equation (Equation 2-3), in which the

posterior probability is a product of the posterior probability of each individual attribute

Xi of attribute vector X for a class.

)(

)|()()|( 1

xXP

ACxXPACPxXACP

k

iji

=

======

∏ 2-3

In Equation 2-3, P(Xi=xij|C=A) is the posterior probability of the ith attribute Xi (of

attribute vector X) taking the jth value xij for class A.

As with other supervised classification methods, Bayesian classification uses relevant

attributes. In the previous research by Torabi [33], these attributes are extracted from an

image after MaxMin thresholding. The number of attributes used in his research is six. It

is assumed that those attributes are independent, thus the Naïve Bayesian model is

adopted. The model is given in the two equations below:

12

)(

)|()(

)()|()()|(

1

xXP

WPCxXPWPCP

xXPWPCxXPWPCPxXWPCP

k

iji

=

==×==

===×=

===

∏ 2-4

)(

)|()(

)()|()()|(

1

xXP

WOCxXPWOCP

xXPWOCxXPWOCPxXWOCP

k

iji

=

==×==

===×=

===

∏ 2-5

where P(C=WP|X=x) is the probability that the image should be classified as WP given

that the attribute values are given by x; similarly P(C=WO|X=x) is the probability that the

image should be classified as WO given that the attribute values are given by x. The

classification model will classify the image as WP if P(C=WP|X=x) is larger than

P(C=WO|X=x) and as WO otherwise. In practice the actual probability values are not

calculated or compared; rather probability densities are used. This will be explained in

more detail below. Also, the quantity P(X=x) in the denominator is omitted from

Equation 2-4 and 2-5 because its existence makes no difference to the classification (i.e.

to the relative values of P(C=WP|X=x) and P(C=WO|X=x)). P(C=WP) and P(C=WO)

are the prior probabilities for a new image to be with or without a contaminant particle,

respectively. They are calculated based on the frequency of the number of WO or WP

images used to build the model. Xi denotes the ith image attribute, and xij is the jth value of

attribute Xi in any given image. P(Xi=xij|C=WP) is the probability of an individual

attribute Xi of attribute vector X taking a value xij which is the jth value of Xi given the

image belonging in class WP. There are “k” attributes in each image (i = 1 to k), and in

13

Torabi’s work k was 6 [2]. Typical attributes were mean pixel density, pixel density

standard deviation, particle percentage area and its standard deviation, etc.

In Torabi’s research, as mentioned above, the attributes are independent from each other

and their values follow a normal distribution. Therefore, the calculation of posterior

probability for the attribute values obtained given that the image is in the WP class is

obtained from

dxx

dxWPCxXfWPCxXPi

iij

iijiiji

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ −−====== 2

2

2

)(exp

21)|()|(

σ

μ

σπ 2-6

where f(Xi=xij|C=WP) is the probability per dx increment and is termed the probability

density of attribute Xi with value xij given image belonging in class WP; µi is the mean of

ith attribute and σi is its standard deviation. As mentioned above, in practice dx is not

included in the calculation and it is probability density rather than actual probability that

is examined by the classification model. An exactly analogous equation to Equation 2-6

is used for the WO images.

dxx

dxWOCxXfWOCxXPi

iij

iijiiji

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧ −−====== 2

2

2

)(exp

21)|()|(

σ

μ

σπ 2-7

Thus, the class of an image is determined by the following equation:

)()|(maxarg YCPYCxXfclassY

===← 2-8

Equation 2-8 means that the class of an image being classified is the class for which the

image has the maximum product of probability density multiplied by prior probability,

i.e., f(X=x|C=Y)P(C=Y).

14

In Torabi’s research, the captured images from the Scanning Particle Monitor (SPM) can

be divided into two categories in terms of image quality: low background images (Figure

2-2 ) and high background noise images (Figure 2-3). Figure 2-2 shows an image with

microgel contaminants in polymer melt with no visible additive in the background. In

Figure 2-3, glass microsphere is present in polymer met with high background noise due

to the presence of talc additive.

To integrate the Intelligent Learning Machine (ILM) with Bayesian model, two

knowledge tables (KT) are created, one for calculating the probability of WP and the

other for calculating the WO probability. The WP knowledge table was formed with WP

images, and the WO knowledge table with WO images. During the incremental learning

period, when new image arrives, the human observer will determine whether an image is

WO or WP and then the measured attributes of the image will be added to the

corresponding knowledge table, and at the same time the model built based upon the

knowledge table will be updated accordingly.

Figure 2-2 Mircogel image with no additive in the background

15

Figure 2-3 Glass Microsphere (GMS) Images with talc additive in the background

In Torabi’s research, a Bayesian model was created off-line using 2000 training images

of microgel contaminant particles with low background noise such as in Figure 2-2. The

classification accuracy reached 95%. However, when this model was used to predict the

presence of glass-microsphere particles in the images with high background noise such as

image in Figure 2-3, the misclassification rate is as high as 50%. The reason for the poor

prediction performance is that the model was developed based on microgel images with

low background noise level but glass-microsphere images had a high background noise

level. Thus model built on microgel images detected the additives (i.e. the background

noise) of the glass-microsphere images as real particles. At the beginning of the test, the

error rate exceeded 50%. As the images of glass-microsphere particles with high

background noise were being captured, the human observer added 300 new images of

glass-microsphere particles with high background noise and known class label (WP or

WO) were added to the respective WP or WO knowledge table and the model was

updated through the ILM. The error rate in particle detection was measured after each

update. The software processing time for each image was less than 3 seconds. Therefore,

16

processing a set of 300 images required about 15 minutes. This process was repeated ten

times, and the classification error rate gradually dropped to 12% at the end of model

update. The change in classification error rate is shown in Figure 2-4 [2].

Torabi’s research developed a very practical real-time method for in-line image

classification by adapting the Bayesian model with a new invention of the Intelligent

Learning Machine (ILM). It is proved that the approach is able to rapidly adapt to a

variety of images captured in different imaging environment. The classification model is

updated in real-time, and there is no interruption of the process monitoring resulted.

Furthermore, the method proved to be extremely efficient and flexible without

compromising the quality or accuracy of the adapted model. However, the classification

accuracy by the adapted model is 92%, which is not very high. That means that this

0%

10%

20%

30%

40%

50%

60%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23Time

(Each interval is equivalent to 15 min or 300 Images)

Err

or r

ate

in im

age

clas

sific

atio

n (p

artic

le d

etec

tion)

Figure 2-4 Incremental Adaptation of Bayesian Classification Model

17

approach needs improvement: it is not sufficiently powerful in adapting to changing

image quality. This assessment can be further justified with the fact that a stand alone

static model built only with images of the high background noise level achieved a higher

classification accuracy of 92%. Thus, there is still a need to improve image quality

because the model alone can not provide a system of high classification accuracy to a

wide variety of images of different quality.

Image quality improvement entails image modification. Image modification is a complex

task because it involves a large number of operators. All available operators have their

own purpose of usage. To improve the quality of an image, it involves first the selection

of image operators from many candidate operators and then the determination of their

order of application. However, not only do the different operators have a different effect

on image quality so does their order of application matter. This already presents a

daunting task with high dimensionality. Therefore, part of the task of image quality

improvement is to reduce the dimensionality of image modification. Added to the

complexity is that each image operator has its own parameter setting, which needs to be

adjusted in order to achieve the desirable effect on image quality. More discussion on the

complexity of image modification will be followed in the Results and Discussions

chapter.

In this research, the goal is to improve image classification accuracy. Thus there is a need

to understand the relationship between image quality and classification, that is what is the

image quality required for improved classification.

18

Since the objectives of this thesis combine image processing and data mining, there is

very large amount of relevant literature. The emphasis in this section is on describing

the proposed strategy for accomplishing the objectives in the context of this literature.

2.3 Off-line Image Quality Modification

The first objective of this work is to develop an off-line automated method to determine

how to modify image quality of reference images. Reference images are those which are

typical images to be encountered but for which the software is informed as to whether the

image is in the WO or WP class. The sought modification is not to obtain an image that

appears most appealing to the human eye but rather to obtain an image that results in

better classification.

Thus, this section examines the literature with respect to how image quality has been

defined and modified.

2.3.1 Proposed Strategy for Accomplishing the First Objective

The proposed strategy for accomplishing this objective is:

i. Develop a general method for reducing the dimensionality of the task of

image quality improvement by selecting image quality operators (IQ

Operators) and their order of application. Image quality needs to be defined

using property measurements of an image: “Image Quality Metrics” (IQ Metrics).

A method of selecting the best ways of altering the IQ Metrics, Image Quality

Operators (IQ Operators), is then needed. This method should be applicable to a

wide variety of images and sufficiently broad in scope to initially include many

different IQ Operators.

19

ii. Given the IQ Operators and their sequence of application for a specific raw

image, develop a computer implemented method for determining the values

of parameters in these operators. Once image quality is defined and the IQ

Operators are selected, the parameters in these IQ Operators will be

systematically varied to obtain the optimal image quality.

iii. Formulate a “Reference Image Database”. Each optimized image will provide

two types of information: (a) a description of the raw image using IQ Metrics and

other necessary attributes related to objects of interest (such as particles) and (b)

instructions on how to transform the raw image into an optimized image by

specifying the image quality operators, their order of application and values of

their parameters. This information for all images provides the “Reference Image

Database” to be used to accomplish the second objective.

The following sections review the published literature relevant to this strategy.

2.3.2 Defining Image Quality: Image Quality Metrics Image quality metrics (IQ Metrics) are the quantitative values which define image

quality. Images are used for diverse purposes. Thus, in order to define the concept of

image quality in a reasonable manner, the underlying task of using the image should

generally be specified: an image can be defined to be of good quality if it fulfills its

intended task well. Image quality then becomes a task-dependent quantity. This is evident

in much of the literature [34-44]. For example, medical images are used as a means to

obtain information of the health status of the patient, and ultimately, clinical image

quality should be defined by the impact of the image on correct diagnosis or on the

20

outcome of the treatment of the patient. It could be thought that such a definition of

image quality would obscure matters: image quality is then not solely dependent on

image characteristics, but also on the specific task, the observer’s a-priori information on

the task and the observer’s ability to use both the prior information and the image

information for his decisions. These factors make image quality definition a challenging

problem. The choice of IQ Metrics is also dependent on the applications. In radiological

imaging, image noise is the most important quality-limiting factor in radiological

imaging, because it sets limits to the detectability of details and also restricts possibilities

with regards to obtaining the details visible by image enhancement (e.g., image

sharpening and contrast increase) [45]. However image noise is not very critical if the

object to be detected in an image is much larger than the noise in size.

Given the above situation, however, the formulation of IQ Metrics has continued to be

pursued because the explosive growth of digital imaging has led quality metrics to be

applied in various fields. Image properties such as noise, sharpness and contrast lead to

many objective image quality definitions [37, 38, 40-42, 46-48]. The understanding of the

human visual system leads to many subjective image quality definitions [35, 37, 40-44,

47, 49]. In the subjective definitions, human perception become paramount, and IQ

Metrics are correlated with the preference of an observer. The dilemma is that many

objective IQ Metrics do not correlate well with subjective IQ Metrics. As a result, a great

deal of effort has been made in recent years to develop objective IQ Metrics that correlate

well with subjective quality metrics [38, 40, 42, 44, 50] by incorporating the human

visual system into the objective IQ Metrics. Unfortunately, only limited success has been

21

achieved. In this review, the emphasis will be on objective IQ Metrics since they are

more suitable than subjective IQ Metrics for real-time process monitoring.

Generally speaking, IQ Metrics have two main uses: quality control [3, 6, 10, 13, 16, 51]

and benchmarking image processing methods [40-42, 44].

Objective IQ Metrics, in general, can be divided into two groups: reference and no-

reference IQ Metrics. Reference objective IQ Metrics need a reference image (often the

original image) to calculate the metrics. Because of this, they are also called bivariate

metrics. Most of the reference objective IQ Metrics exploit the deviations between the

corresponding pixels in the reference image and the processed or degraded images. No-

reference objective IQ Metrics do not require a reference. Thus they are also called blind

IQ Metrics. Reference objective IQ Metrics have gained much attention [34, 35, 40, 41,

44] in image processing because they are suitable for evaluating the performance of

image processing algorithms, particularly compression and filtering methods. Appendix I

provides an overview of Objective IQ Metrics that do and do not require a reference.

It is evident that IQ Metrics that do not require a reference are preferred for in-line

monitoring applications. Reference images may not always available for real-world,

and, in particular, real-time applications. Also, in-line monitoring requires high-speed

processing that synchronizes well with the changes in the monitored process. This means

that the selected IQ Metrics should be simple computationally as well as efficient and

reliable.

For this purpose, no-reference IQ Metrics are more suitable than reference IQ Metrics.

22

Changing the values of IQ Metrics requires Image Quality Operators (IQ Operators).

These are the subject of the next section.

2.3.3 Image Quality Operators (IQ Operators) There are a large number of well established image quality operators (IQ Operators)

available. Image quality operators are usually divided into two major categories:

radiometric and geometric [52]. Radiometric operators, also called pointwise operators,

act on the original image by changing its brightness distribution. With geometric

operators, the grey value in each pixel of the image is changed according to its

neighborhood. Details on the most important image quality operators are summarized in

Appendix II.

ImageJ, the software used in this work, for example, has 54 different operators. When it

is realized that most operators contain their own adjustable parameters, that they can be

applied more than once, and that their order of application affects the results on the

image, it can be seen that the dimension of the operator selection problem is staggering.

A strategy for their application is needed.

Selection of the appropriate IQ Operators, the correct values of their individual

parameters and their order of application represents a sizable data screening problem.

Identification of irrelevant IQ Operators and elucidation of inter-dependence amongst the

various IQ Operators are particularly important.

Over the years, many systems and methods have been developed for image analysis [53-

63]. Many of these systems have been reported to work well on specific types of image

23

analysis tasks. The way in which image expertise and knowledge are incorporated into a

system varies widely from one system to another, with little systematic generalization

reported. It has been a common practice for researcher and system designers to design

new systems or to modify existing systems to solve their own image analysis problems

through a process of trial-and-error experimentation. Despite many efforts to develop a

general vision system [64] such systems developed have failed to compete with domain

or problem-specific systems in solving practical image analysis problems.

A compromise between a problem specific system and a general vision system is a

flexible specific system, which is not universal but that can adapt to quite a wide range of

practical problems. In the past two decades, there have been many attempts to construct

such systems [65-72]. Most of these systems are knowledge-based with the support of

artificial intelligence (AI) planning technique. There are two types of knowledge: the

knowledge independent of the content of a given image; and the knowledge dependent on

it. The former is the knowledge of image data types and image processing algorithms

(operators). The latter is based on the image processing expertise of experts. These two

types of knowledge alone are not enough to generate an image processing process.

From these literature there is no real guidance how best to automatically use IQ Operators

to improve performance of an image classification model. The primary difficulty in this

work is defining image quality so that it is relevant to classifier performance. An image

that appears superior to the human eye may not be considered superior to the

classification model! The struggle to define image quality is a significant part of what

was done to accomplish the first objective of this thesis and will be detailed in the Results

and Discussion.

24

In this research the aim was to define image quality with a single number and to use a

numerical optimization program to adjust the parameters in the Image Quality Operators

in order to maximize image quality for each individual image. In this case the Nelder

Mead Simplex Optimization method was used.

2.3.4 Optimizing Image Quality Once the IQ Operators and their order of application have been selected for a specific raw

image the problem is to rapidly determine the best values of the parameters in the IQ

Operators so as to obtain the highest quality image. An optimization method will need to

systematically change the parameter values and evaluate the image quality following

each change. A wide variety of optimization methods are available. However, the

Nelder Mead Simplex method appears particularly attractive. The method tolerates

experimental error very well in obtaining the optimum. This is an important aspect in this

work because of between image scatter. It has been used for both numerical optimization

and sequential experimental design but not for image analysis. It is essentially a “logical

guessing” program. Here it systematically guessed the values of the parameters,

modified the image, computed image quality and iterated until image quality was a

maximum. It accelerates towards the optimum and it does not require calculation of

derivatives. The algorithm depends only on the relative value of results obtained in each

trial. Disadvantages of the method are that it may fail or at least become very inefficient

if large numbers of IQ Operators are selected. The Simplex method is reviewed and

described in greater detail in Appendix III.

25

2.4 In-line Image Quality Modification As mentioned above, the second objective is “to develop an in-line method for modifying

the quality of acquired images to permit improved classification by using the results of

the first objective as a database”.

2.4.1 Proposed Strategy for Accomplishing the Second Objective The proposed strategy for accomplishing this objective is as follows:

i. Locate the reference image in the Reference Image Database most closely

resembling newly acquired image.

The IQ metrics of newly acquired image need to be compared with each image in

the Reference Image Database to obtain the most similar reference image in each

case. Combining and comparing these IQ Metrics involves defining raw image

quality so that the most similar reference image can be located.

ii. Improve the quality of newly acquired image and assess the classification

performance.

In this case newly acquired image is modified and classified using the Torabi

Bayesian model trained on optimized images.

iii. Devise how to adapt image quality modification to deal with variable raw

images, combine this with Torabi’s adaptive Bayesian classification and assess

the performance.

This requires developing and implementing an adaptive image quality modification

combined with adaptive Bayesian classification model and assessing the effect on

26

classification. The assessment will employ a wide variety of images from the scanning

particle monitor.

2.4.2 Use of the Reference Image Database: Case-based Reasoning Case Based Reasoning (CBR) is ideally suited to utilizing the Reference Image Database

in order to locate the reference image most similar to a newly acquired image along with

the accompanying instructions on how to use the IQ Operators to optimize the new

image.

Case-based Reasoning (CBR) is an approach in which new problems can be solved by

reusing solutions to past solved problems. Past cases or problems represent valuable

knowledge, especially in a weak theory domain in which the relationship of cause and

effects may not be well understood. In the fields of artificial intelligence and machine

learning, CBR has been described as a model for conducting Artificial Intelligence (AI)

research, and as a knowledge engineering methodology for deploying practical intelligent

systems [73]. In fact, CBR has been applied to a full spectrum of Artificial Intelligence

tasks, including classification, scheduling, planning, design and diagnosis.

In the past decade or so, CBR has gained popularity in many domains because of the

generality of the idea. The key areas that have seen its application are medicine [74], law

[75], education [76], knowledge management [77], image processing and interpretation

[78-85]. While some artificial intelligence and machine learning techniques have stayed

in the laboratory for decades, producing research prototypes before commercialization,

CBR has provided solutions to many engineering applications from its infancy. CBR

27

systems for engineering application save millions of dollars in production costs and

demonstrate their effectiveness and applicability in diverse domains.

The first successful milestone application of CBR was Lockheed’s CLAVIER system

[86, 87]. The system is used to advise engineers of the loads of composite material parts

in large autoclave curing ovens. It has been in daily use since 1994 and it is a very

important system because it possesses all four elements of a typical CBR system:

retrieval, reuse, revision and retention. It is interesting because it was put into use only

after the failure of mathematical modeling and expert systems to solve the problem.

Another fielded engineering application of CBR is an innovative project called Cassiopee

[88]. The project developed a software system to support fault diagnosis of the CFM 56 -

3 aircraft engine used in the Boeing 737. The system uses case-based reasoning which

exploit failure descriptions that are stored in a case base. The system is used by airline

maintenance staff and CFMI specialists as a troubleshooting support.

Among other notable engineering CBR systems are the plastic colour matching system

for selecting a recipe of pigments that match a customer’s requested colour and has been

in use since 1996 at multiple General Electrical plastic sites [89], an on-line distributed

case-based reasoning system for estimating the cost of residential air conditioning system

in Western Australia [90] and a system called ICARUS (Intelligent Case-based Analysis

for Railroad Uptime Support) [91] which diagnoses locomotive faults and determines

probable causes for the faults and repair actions to correct the cause of the faults using

historical cases as reference. ICARUS was fielded in 1997 and has been in constant use

since that time. Recently, CBR has seen emerging applications in molecular biology,

28

such as the problem of analyzing genomic sequences and determining the structure of

proteins [92]. Another significant emerging area in bioinformatics and medical

informatics for CBR is the area of image analysis [80, 93].

The advantages of CBR over other machine learning techniques lie in a few aspects:

knowledge acquisition, knowledge maintenance, increasing problem-solving efficiency,

increasing quality of solution and extensibility. These are examined in the next section.

2.4.2.1 Advantages of Case-Based Reasoning The classic method for knowledge acquisition for a traditional knowledge-based system

(KBS) is through rule elicitation and formalization. The rule acquisition process can be

laborious and unreliable, especially when a large set of rules is required to meet the need

of a system. CBR does not generalize rules from cases. Therefore, the cost of knowledge

acquisition for CBR is very low [87]. However, CBR does require considerable initial

effort in gathering information and building up a case base. After the initial knowledge

base is created, it is often not difficult to augment and maintain the knowledge a CBR

system needs.

As with Artificial Intelligence (AI) applications, knowledge acquisition is just the initial

step towards a successful knowledge-based system (KBS). Without exception,

knowledge maintenance is required to refine and revise the previous knowledge and add

missing knowledge to a case base. In this sense, knowledge maintenance is an extension

of knowledge acquisition. CBR provides an important benefit for knowledge

maintenance because a user can add a new case to the case database without ruining the

29

functionality as is the case in a rule-based system. Because CBR, by nature, is an

incremental learning system, it can be readily expanded if the current case base is

insufficient to handle a new type of problem.

The reuse of past experience/solutions in CBR improves problem-solving efficiency

because a new solution will be building on prior solutions rather than repeating the effort

from the beginning. More than upgrading problem-solving efficiency, CBR also increases

the quality of solutions when a system is not well understood. The solution suggested by

CBR could be more accurate than those imperfect rules because cases reflect what

happened in reality.

Another vital benefit that CBR offered is its extensibility (or scale-up capability) to larger

problems. Its extensibility has now being tested in applications. The Cassiopee system

mentioned above, a case-based diagnostic aid for jet engines, uses 16,000 cases for its

diagnosis process [88]; and ALFA, a case-based system for power plant load forecasting,

is in operation with a case library of 87,000 cases [94].

In this work, CBR is to be used to select image processing requirements for new images.

The use of CBR in image interpretation is examined in the next section.

2.4.2.2 Case-Based Reasoning in Image Interpretation In general, image interpretation often requires many levels of sequential processing.

These many-levels of processing add complexity to the image interpretation system

because the result of processing at one level strongly depends on the performance of the

preceding level. Thus, image interpretation system is often domain specific and works

30

only under certain conditions and image quality. However, for a real-world image

monitoring system, environmental conditions change and noise can suddenly appear in

the images. Such difficulties present challenges and demand adaptability in image

interpretation.

Case-based reasoning potentially can meet such challenges because it is able to update

the case base and learn new knowledge incrementally. Examples include computer

tomography (CT) [80, 82, 85], in microscopy in diagnostic histopathology [95], in

myocardial scintigrams for automated detection of coronary heart disease [96], in

ultrasonic B-scans [97], in the development of image processing steps for image

processing problems not yet solved [78] and ultrasonic images [84]. However, the

application of CBR in image interpretation is still in its early stage. The types of images

to which CBR has been applied are mainly limited to medical images. There is no

reported application of CBR to images of particles. The application of CBR in image

interpretation has not been used for in-line image monitoring and real-time applications.

Although, as can be seen from the proceeding sections, CBR is potentially a very

powerful method, there are some issues. These include: case representation of image,

selection of similarity attributes, similarity measures, case maintenance (addition and

deletion) and incremental learning. The following sections examine these issues with

emphasis on CBR application in this work.

2.4.2.3 The Case-Based Reasoning Process CBR methodology is based upon two assumptions. The first assumption is that similar

problems have similar solutions. Accordingly, solutions for similar past problems are

31

useful for new problem-solving. The second assumption is that, in a specific context or an

application system, the same types of problems tend to recur. Therefore, future problems

are similar to prior problems [98]. If these two assumptions are true, remembering the

prior problems and their solutions and reusing or revising them for solving new problem

can then be a very effective problem solving method.

In the Case-based Reasoning (CBR) community, CBR tasks are divided into two classes:

interpretive CBR and problem-solving CBR [73, 98]. Interpretive CBR uses prior cases

as reference points for classifying or for characterizing new situations; problem-solving

CBR use prior cases to infer solutions that could apply to new cases. In this literature

review, the focus is problem-solving CBR since it is more relevant to one of the

objectives of this research – to develop an automated method for modifying the image to

the image quality required for improved classification results.

A classic CBR model suggested by [99] is shown in Figure 2-5. The search for a solution

to a new problem (i.e., a new case) involves obtaining a problem description, measuring

the similarity of the new problem to prior problems stored in a case database (often

termed a “knowledge base”), retrieving one or more similar cases and attempting to reuse

the solution of one of the retrieved cases (or if necessary adapting it to the situation

presented by the new problem). The solution proposed by the system is then evaluated. If

the evaluation requires, the proposed solution is then revised and becomes a confirmed

solution. Depending on the updating scheme adopted, the new problem and its solution

can then be retained as a learned case and added to the case database for future use.

32

Of the four different steps in CBR (i.e., retrieve, reuse, revise and retain) the retrieval and

retention steps have attracted most attention due to their pivotal roles. Case retrieval has a

direct impact on system performance because the retrieved cases are a reference or start

point towards the final solution [100]. The more relevant the similar cases retrieved, the

better the chance of arriving at an appropriate solution. Retention, on the other hand, will,

to some extent, determine the potential and competence of the CBR system to solve a

variety of problems without computational sacrifices. In this review, the focus is on these

two aspects with emphasis on image analysis.

Figure 2-5 Case-Based Reasoning Process

33

2.4.2.4 Case Retrieval in Image Interpretation As pointed out above, case retrieval is an important step in the CBR process. It involves

identifying attributes, searching the case database, calculating similarity and selecting the

similar cases using defined criteria [99]. Many CBR systems deploy a two-step retrieval

approach, first retrieving a set of promising candidate cases, and then implementing a

“finer-grained” evaluation of the similarity of the retrieved cases and the new case.

Identification of Similarity Attributes The attributes to be used in CBR are domain specific. The determination of the correct

attributes to compare is sometimes not very clear. Decisions about which attributes are

important are often based on explanations of attribute relevance.

In image analysis and processing, attributes could consist of non-image information and

image information [101]. In general, attributes can be numerical, categorical or symbolic.

The attributes used to describe an image case depend on the type of image and the task of

image interpretation. For an example, for CT images, non-image attributes of patient

information such as age, sex, slice thickness and number of slices were used by Perner

[80]. In a railway inspection application [83], the type of sensor for capturing ultrasonic

image was considered an important non-image attribute. Other image acquisition

attributes such as illumination of the scene, and information about the objects could be

considered non-image information as well. Image related attributes adopted include the

pixel matrix itself [102], objects contained in the image and their attributes, as well as the

spatial relation between the objects. A four-level hierarchy representation of an image

34

case was developed [97]. At the lowest level, attributes are objects described by their own

attributes such as location, orientation, type (line, parabola or noise) parameters. At the

highest level the attribute is the image scene. This hierarchy attribute-based

representation of image allows matching the case in the case base on different granularity

levels. Other high-level abstraction attributes extracted from an image are also used to

represent a case. The noticeable attributes are numerical properties of the image, such as

the statistical measures of the grey level, including mean, variance, skewness, kurtosis,

variation coefficient, energy, entropy, first-order histogram, and centroid.

Using of Similarity Attributes for Similarity Measurement

With the identification of attributes, CBR then relies on similarity assessment to calculate

the distance between previous cases and the current case. However, the attributes and

similarity measure are tightly coupled. For example, some attributes are symbolic and can

be used qualitatively but not quantitatively. Two major similarity assessments are

proposed in image interpretation: surface similarity and structural similarity. The

selection of attributes amongst these two assessments depends on the case representation

of a specific application. In some applications, surface similarity is adequate for

measuring the similarity of the stored case to the target problem. In other applications,

cases are represented by complex structures such as graphs and consequently require

assessment of the structural similarity to retrieve cases.

Surface similarity uses attributes. These attributes are often attribute-value pairs. In

surface similarity, the similarity of a case in a case base to the current case is computed

using the selected attributes and the similarity measure is given as a real number.

35

In image analysis, depending on the image representation, similarity measure can be

divided into three categories. First, pixel-matrix based similarity measures; second,

attribute-based surface similarity measures (either numerical [80], symbolic or mixed

type); and last, structural similarity measures [96, 102-104].

Perner (1999) calculated the similarity measure by combining the non-image similarity

and image similarity for automatic CT image segmentation. She defined a similarity

measure for image information in the following equation:

∑= −

−−

−−

=k

i ii

iiB

ii

iiAi CC

CCCC

CCw

kdistAB

1 minmax

min

minmax

min1 2-9

where CiA and CiB are the ith attribute values of image A and B, respectively. Cimin and

Cimax are the minimum and maximum values of the ith attribute, respectively. wi is the

weight assigned to the ith attribute with the sum of wis equal to unity. k is the number of

attributes. The attributes used are statistical measures of the grey level including mean,

variance, skewness, kurtosis, variation coefficient, energy, entropy, and centroid.

Euclidean distance, as shown in Equation 2-10, is used to calculate similarity in other

application domains such as manufacturing process of printed circuit board [105] but not

in image interpretation.

∑=

−=k

iiBiA CCdistAB

1

2)( 2-10

where CiA and CiB are the ith attribute values of image A and B, respectively. K is the

number of attributes.

A structural similarity measure introduced by [102] was applied in image segmentation

[80]. The similarity measure takes the image matrix itself and calculates the similarity

36

between the two image matrices. The calculation of similarity is an average of distances

for all pixels in one image to the corresponding pixels at the same location in the other

images. The measure reflects the structural similarity of two images. It requires the

storage of images in the case base.

2.4.2.5 Case Retention in Case Based Reasoning Retention involves the update of case base with newly learned cases, thus it is the

learning part of the CBR process. Retention enables the system to learn new cases, which

in turn enhance the competence of the system in dealing with new problems. There are

two aspects of retention, success-driven learning and failure-driven learning [98].

For success-driven learning, the resulting solution for a successful CBR process is stored

in the case base for future reuse. In this way, success-driven learning favors cases that

more likely to lead to success. The stored successful cases thus will increase the capacity

of the system to solve problems which the system might not be able to solve previously.

However, success-driven learning can cause redundancy in the case base. That is, some

very similar cases with very similar solution could be retained into the case base. This in

turn could increase the search time in case retrieval.

For failure-driven learning, CBR values the failed cases as well as the successful cases.

The rational behind this is that failures indicate that learning is needed. In addition,

failures reveal what is needed to be learned to avoid future failure. In failure-driven

learning, the initial knowledge acquisition and the filling of the case base is success-

driven with successful cases added to the case base. After the case base is big enough, the

CBR system starts to learn from failure, in which the solution obtained through the CBR

37

process does not work. This failure is called task failure by Leake [98]. The failed case is

then subject to repair outside the CBR process until a successful solution is found. This

case, with its new solution, is then retained and added to the case base. In essence, this is

still learning from successes but it employs a repaired solution.

Different from task failure, expectation failure addresses the scenario that the system

expects that a solution to a new problem will work but it does not. The expectation

failure could help to avoid a similar problem in the future by suggesting that the system

learns to predict it in advance.

For an image interpretation system, case base maintenance has not yet been extensively

studied. Jarmulak [97] adds new case to the case base, splits the case base into clusters of

fixed size and use a prototype to represent them in the hope of speeding up the search

process. Perner also updated the case base with newly learned cases and adopted a way of

learning of case classes and prototypes[80, 106]. It has been suggested that the deletion or

forgetting of cases is not preferable in image interpretation. In addition, Perner

recommends that distorted and very noisy images and images with illumination defects

should not be added to the case base if the image analysis or reasoning process cannot

handle it.

2.4.2.6 Computation Efficiency of Case-Based Reasoning The Computation efficiency in CBR is closely related to the organization of case base

which has a direct effect on retrieval time. If a case base is flat and very large, it will take

time to calculate the similarity measure between the current problem case and each case

38

in the case base. Consequently, a flat case base is to be avoided in order to speed up the

case retrieval process. An alternative to flat case base is to create a hierarchical case base

such as a decision-tree-like structure proposed by McSherry [107] and hierarchical

clustering by Perner [106]. In this fashion, similar cases are grouped together under the

same branch. This organization allows separation of a group of similar cases from that of

non-similar cases at the earliest stage of the retrieval process.

In case of a flat case base, parallel computing is considered a viable approach but

obviously the requirement of expensive hardware is a drawback.

For CBR applications in image analysis, it is generally considered that the computation

workload is not a critical issue if the case is represented by high-level numerical

attributes instead of the image matrix itself [93].

39

2.5 Evaluation of Classification Methods This work aims at improving classification of images by modifying image quality. As

will be seen in subsequent sections, comparison of different classification results is

necessary when comparing different definitions of image quality and the same

classification model or when comparing different classification models. There are many

measures that can be used to accomplish such a comparison. The most basic summary of

classifier performance is a “confusion matrix”. As shown in Table 2-1 for the binary

class situation (WO or WP) of this work, each element of the matrix shows the number of

images for which the actual class is given by the label for the row and the predicted class

is given by the label for the column. The elements of this matrix are used to provide a

variety of calculated measures of classification. These are summarized in Table 2-2 and

Table 2-3. Table 2-2 shows the confusion matrix using the data mining terminology for

the basic measures of the table. Table 2-3 shows definitions and the many measures

obtainable from the confusion matrix.

When classification results are to be examined in this work the first result presented is the

confusion matrix. Attention is then drawn to the following measures defined in detail in

Table 2-3 :

1. Classification Error Rate: the fraction of cases (either actual WO or actual WP)

not correctly classified is an extremely important measure. It provides an overall

view of the classification. It will be seen that most of the classifications can be

done with better than 90% accuracy. It is the last 10% of classification accuracy,

i.e. the classification error rate, that is extremely important in many process

40

control situations. Under some circumstances it can mean the difference between

erroneously shutting down an extrusion line 10% of the monitoring time versus

less than 1% of the monitoring time. Therefore, reduction of classification error

is of utmost importance.

2. False Positive Rate: erroneously identifying images that have no particle as

having a particle means causing a false alarm in a process control situation. The

false positive rate expresses the number of incorrectly classified WO as a fraction

of the number of WO images.

3. False Negative Rate: erroneously identifying images that have a particle as

having no particle means allowing off-specification product to be produced. This

error would generally be expected to have worse consequences than the false

alarm of a false positive rate.

In this work classification error, false positive rate and false negative rate are presented in

a bar graph and are compared against another classification in the same graph.

Table 2-1 Confusion Matrix for a Binary Classification Problem

Predicted Class is WP Predicted Class is WO

Actual Class is WP # of Images that are Actually WP and are

Predicted as WP

# of Images that are Actually WP but are

Predicted as WO Actual Class is WO # of Images that are

Actually WO but are Predicted as WP

# of Images that are Actually WO and are Predicted as

WO Table 2-2 Confusion Matrix for a Binary Classification Problem Showing Data Mining Measures

Predicted Positives Predicted Negatives Actual Positives (P) True Positives (TP) False Negatives (FN)

Actual Negatives (N) False Positives (FP) True Negatives (TN)

41

Table 2-3 Definition of Terms Related to Classification

Name Formula Meaning

Positive P Total number of positive cases =number of “with particle” (WP) images labeled by Human Observer

Negative N Total number of negative cases = number of “without particle” (WO) images labeled by Human Observer

Number of cases P+N Total number of cases in the sample True positive TP Number of positives correctly classified as

positives False positive FP Number of negatives incorrectly classified

as positives True negative TN Number of negatives correctly classified as

negatives False negative FN Number of positives incorrectly classified as

negatives True positive rate (sensitivity) P

TPTP =(%) Fraction of positives correctly classified as positives

False positive rate NFPFP =(%)

fraction of negatives incorrectly classified as positives

True negative rate (specificity) N

TNTN =(%) fraction of negatives correctly classified as negatives

False negative rate P

FNFN =(%)

fraction of positives incorrectly classified as negatives

Precision (%) 100

FPTPTP+

Fraction of cases identified as positives that are really positives

Classification Accuracy (%) 100

NPTNTP

++

Fraction of cases correctly classified in the total sample

Classification Error Rate (%) 100(1-

NPTNTP

++ )

Fraction of cases not correctly classified in the total sample

For the purpose of evaluating the performance of a classifier, the results in confusion

matrix are obtained using the technique of cross-validation, which prevents overfitting of

a classifier. The basic idea of cross-validation is to split the data into training set and test

set. The training set of data is used to create the classifier while the test set is used to test

42

the predictability of the classifier in terms of accuracy or error rate. There are three main

types of cross-validation: test set (also called “hold-out estimation set”), leave-one-out

and k-fold cross validation. These three methods differ in how the data set is divided

between training set and test set and how error is evaluated. The test set method randomly

chooses a certain percent of the data and retains the remainder of the data as a training

set. The classification results are attained by applying the classifier to the test set data.

The classification errors for the test set are reported in the confusion matrix. The leave-

one-out cross validation (LOOCV) method leaves one data record out at a time for testing

and the rest of the data records are used for training. All data records in the data set are

used exactly once for testing. When all data records have been used for testing, the errors

are summed and reported in the confusion matrix for the classifier. The test-set method is

simple and computationally inexpensive but it is vulnerable to the variance of the

selection test set, which results in unreliable future predictability. In contrast, LOOCV

uses all data points both for training and testing but it is very computationally expensive.

The K-fold cross validation technique takes advantage of the benefits of both the test set

method and LOOCV by randomly breaking the data into k partitions. Each partition is

successively selected for testing once while the remainder of the partitions are used for

training. The classification errors for each partition when it is used as a testing set are

summed and reported in the confusion matrix. The number of partitions (the “k value”)

will have an effect on the accuracy (proximity to the true value) and variance (variability)

of the estimated error rates. With a large k value, the accuracy tends to be high but the

variance is large. For a small k value, the accuracy tends to be low while the variance is

small [108-110]. In practice, the choice of the number of folds depends on the dataset

43

size. For a medium size dataset, the most popular k-fold cross validation is 10-fold [108-

110]. This value will be used throughout this work for evaluating classifiers since we

have medium-size training sets of less than 800 training data records.

Finally, for examining classifier performance, in addition to the confusion matrix,

classification error, false positive rate and false negative rate, this work shows a “receiver

operating characteristic” curve (commonly referred to as a ROC curve).

Figure 2-6 illustrates the receiver operating characteristics (ROC) curve using results

from one of the classifiers used in this work. The ROC curve is a plot of true positive rate

(on the y axis) versus the false positive rate (on the x axis). This type of curve was

originally used in signal theory to select an operating point to distinguish the absence or

presence of signal. In a ROC figure, the upper left point (0,1) represents the best possible

classification since it represents a 100% true positive rate and a zero false positive rate.

The point (1,0) at the bottom right corner of the ROC figure represents the worst

classification with a zero true positive rate and a100% false positive rate. The diagonal

line represents a random classifier since it shows a 50% true positive rate and a 50% false

positive rate for the whole range of values on the axes. The important characteristic to

note is that the left top region of the figure is where the best classifier performance lies.

44

00.10.20.30.40.5

0.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

False Positive Rate

True

Pos

itive

Rat

eMaximum Accuracyfor Model of Raw Images

12

Figure 2-6 Example of a Receiver Operating Characteristic (ROC) Curve

The Naïve Bayes classifier used in this work is a probabilistic classifier, i.e., the classifier

assigns every image a probability of belonging to the “WP” class. To draw an ROC

curve the images are ranked from highest to lowest probability of having a particle (i.e. of

belonging to the WP class). Now, a threshold probability value can be specified.

Images with a probability above the threshold are classified as being WP by the model

and images below the threshold are classified as being WO by the model.

The calculation of the probability of an image belonging to a class is described in Section

2.2.2.2. Many different threshold values can be specified. Each threshold value will give

a data point of true positive rate and false positive rate on the ROC curve. If we vary the

threshold value between 0 and 1, it will produce a series of data points which will result

in a continuous ROC curve. For different classifiers, different ROC curves can be created

in the same way.

45

While only classifier accuracy (or error rate) is used to measure the performance of a

classifier, it does not take into consideration the fact that the number of false positives

and false negatives may differ. Also, the cost of errors for false positives (false alarm)

and false negatives are not the same in many applications. In addition, two data points

can have the same accuracy in a ROC curve such as data points marked “1” and “2” in

Figure 2-6. For these reasons, the area under the ROC curve (AUC) has been recently

considered by many to be a statistical consistent measure of the goodness of separation of

classes by a classifier because it reflects the probability or ranking information of training

cases while a single number of accuracy (shown as one data point in a ROC curve) does

not possess. Overall, the bigger the AUC, the more discriminating is a classifier.

46

3 EXPERIMENTAL PROCEDURE All of the images used in this work were from operation of the “Scanning Particle

Monitor” to view molten polyethylene passing through a single screw extruder. This

research utilized the in-line image monitoring for plastics film extrusion system

illustrated in Figure 3-1. The system was first designed by Lianne Ing and later modified

by Forouz Farahani. A single screw extruder (L/D=24/1) (currently equipped with a film

die) was used. Two sapphire windows were installed opposite each other just previous to

the die. A white light source was mounted outside one window. At the other window is a

specialized CCD camera (the “Scanning Particle Monitor”). The captured images were

sent to a Windows computer for processing and analysis.

Figure 3-1 In-Line Image Monitoring System for Plastics Film Extrusion

Plastics Extruder

Film Die

Plastic Film

CCD Camera

Optical Window

Light Source

47

Table 3-1 summarizes the digital images used in this work. Image Set 1 is that used by

Torabi in his research. Image Set 2 is produced by Ing in her research, and Image Set 3

was generated in this work from extrusion runs. Material feed (the polymer pellets feed)

to the extruder and operating conditions (such as lighting, extruder running speed and

heating temperatures) were different for each image set.

Table 3-1 Summary of Image Data Sets Used in This Work

Image Data Set

Number of Images

Description Origin

Image Set 1 4294 Images generated with grade 530A polyethylene pellets feed made by Dow Chemical but with the injection of different kinds and concentration of particles to the feed.

Keivan Torabi

Image Set 2 1344 Images generated with polyethylene pellets feed of high concentration microgel made by Exxon

Lianne Ing

Image Set 3 2010 Images generated with grade 530A polyethylene pellets feed made by Dow Chemical with injection of different kinds, sizes and concentration of particles to the feed; Also image generated with polyethylene pellets feed of high concentration additive made by Exxon.

This work

The final set of images, Image Set 3, were images produced by extrusion experiments run

especially for this work with the objective of obtaining images of diverse image quality.

A summary of extruder running conditions used for these experiments is shown in Table

3-2. The experimental runs utilized two different grades of polyethylene pelletized feed.

The first (Runs 3-1 to 3-4) was grade 530A polyethylene produced by Dow Chemical.

For this grade, two different sizes of glass microspheres (GMS) (30 micron in run 3-2 and

100 micron in run 3-3) and variable sized glass bubbles (as in run 3-4) were added as a

48

“pulse input” to the feeds. The weight concentration of 30µm GMS is 200ppm while that

of 100 µm GMS is 40ppm. The second kind of pellets feed was high additive content

polyethylene pellets feed from Exxon (runs 3-5 and 3-6). Glass beads were injected into

the feed in run 3-6.

Table 3-2 Image Data Sets Produced from Experimental Extrusion Runs in this Research

Run Number Number of

Images Extruder Feed Pellets Particles Injected

3-1 269 530A Polyethylene from Dow Chemical

No particles injection

3-2 403 530A Polyethylene from Dow Chemical

30 µm – 200 ppm glass microsphere

3-3 400 530A Polyethylene from Dow Chemical

100 µm – 40 ppm glass microsphere

3-4 400 530A Polyethylene from Dow Chemical

20 ppm Variable size glass bubbles

3-5 195 High additive polyethylene from Exxon

No particles injection

3-6 343 High additive polyethylene from Exxon

75 µm – 20 ppm glass beads

49

4 COMPUTATIONAL PROCEDURE Accomplishing the objectives of this thesis required a very intensive software

development effort. The new software, named the Intelligent Image Interpretation

System (IIIS), divides into two types of programs:

(a) “components”- software specific to the requirements of an objective

(b) “shared components” - software that satisfies common requirements amongst

more than one objective and therefore could be used in accomplishing more

than one.

The programming language throughout was Java 2 standard edition 5.0. As will be

described in more detail below, an open source image processing program, ImageJ, was

integrated into the software developed here.

This part of the thesis is organized by thesis objective. Each of the following sections

shows a flow diagram of an objective together with the name of the associated software

components and shared components. To avoid confusion, not all shared components are

shown in the flow diagram. Also, since the first objective requires all of the shared

components, their detailed description is located in that section.

4.1 Software Development for the First Objective: Off-line

Image Quality Modification

The strategy and associated software components for accomplishing the first objective of

this thesis are shown in Figure 4-1. Three components (simplex optimization, image

50

classification component and database component as colored code in light blue), were

associated with off-line image quality modification strategies. The Simplex optimization

component determined the parameter values of selected image quality operators. Once

the image was optimized, the image classification component was used to create

classification model. The Database component dealt with formulating and modifying a

Reference Image Database: adding, deleting and retrieving image case from the

database.

Figure 4-1 Computational Software Components Associated with Off-line Image Quality Modification

51

All software components involved in off-line image quality modification are discussed in

detail in the following subsections.

4.1.1 The Simplex Optimization Component

The Simplex optimization component is illustrated in Figure 4-2. The component is

enclosed with green dotted line. The grey boxes contain the steps required in the Simplex

optimization to determine the optimum parameter values of the selected image operators.

The blue boxes are software components associated with these steps. The steps required

in Simplex Optimization are image processing and objective function evaluation. The

image processing step modified the image with selected image quality operators whose

parameter values were “guessed” by the Simplex algorithm. The objective function

evaluation step assessed the performance of the image processing step. When the

optimization criterion was reached, the optimization stopped. At the same time, the

relevant information regarding the input image, including image quality metric values

and necessary attributes values of particle measurement as well as the IQ modification

instructions obtained in optimizing the image provided the image case, which was added

to the Reference Image Database by the database component.

In implementing the Simplex algorithm the parameter values guessed by the method were

constrained to physically realistic values by using a transformation from infinite space

(where the search was conducted) to parameter value space (where the parameter values

could possibly be). Further details on the Simplex method are provided in Appendix III.

52

As shown in the Figure 4-2, four shared components were invoked by the Simplex

optimization component: the image processing shared component, the image

measurement shared component, the database shared component and the image

classification shared component. These shared components are individually discussed in

the following subsections.

Figure 4-2 Simplex Optimization Component in Off-line Image Quality Modification

53

4.1.2 The Image Processing Shared Component

In this work, the task of image processing was carried out by a free, open-source software

package (“ImageJ” available from http://rsb.info.nih.gov/ij/). All image quality

modification was done using Image J. Thus, as shown in Figure 4-3, the image

processing shared component is actually an interface to ImageJ. This component

combined image quality operators with their parameter values for processing and called

ImageJ to carry out the image modification task.

ImageJ contains 54 IQ operators for de-noising, blur removal, thresholding,

bright/contrast adjustment, edge sharpening, background flattening and other image

modification tasks. It also can provide a histogram of number of pixels versus their grey

level along with various statistics and particle related attributes.

Assembly Image Quality Operators and Their

parameters

Output Image

ProcessingCorresponding

ImageJProcessing Compoents

Image Processing

SharedComponent

Input Image

Figure 4-3 Image Processing Shared Component

54

In this work, “blanket preprocessing” of each raw image was performed previous to any

other modification. It included two operations: brightness adjustment to adjust the

average brightness to 128 and background flattening which corrected the non-uniformity

of illumination caused by uneven back lighting. Both operations were executed in

ImageJ. Similar blanket operations were carried out by Torabi in his work too. It’s a way

of reducing non-uniformity in image quality. After the raw image was blanket processed,

it was saved in an image folder for further processing.

The image processing shared component is a standalone component which was invoked

in both off-line and in-line image quality modification. The implementation of the image

processing component required a full understanding of ImageJ so that to allow access

some functions in ImageJ, which would otherwise be inaccessible. The major challenge

was to make such functions callable through macros from the image processing

component.

4.1.3 The Image Measurement Shared Component

The image measurement shared component (Figure 4-4) developed in this work served as

an interface to measurement (as opposed to image modification) functionalities in

ImageJ. This research involved many image measurement tasks, associated with image

quality metric measurement and particle related attribute measurement. The image

measurement shared component was frequently invoked by the simplex optimization

component and the case-based reasoning component.

ImageJ has 18 different measurements of particle attributes from an image. These

include particle area, mean grey value, standard deviation of grey value, max and min

55

grey value, etc. In this research, the measurements of interest were particle area and

particle mean grey value. The remainder of the measurements were not used.

Figure 4-4 Image Measurement Shared Component

As shown in Figure 4-4, the image measurement shared component first assembled the

measurement requirement. This could include an image quality metric for brightness,

contrast, noise, blur, illumination uniformity, and local contrast as well as attributes of

interested objects (particles). Depending on the type of measurement, different actions

would be taken. If the measurement was an IQ metric, the measurement tasks were

simply sent to ImageJ. In the case that the measurement was object (i.e. a particle)

related, first the image was thresholded using the image thresholding shared component;

then the measurement was performed using ImageJ.

56

However, ImageJ itself did not possess the capability to measure the six image quality

metrics: the brightness, contrast, noise, blur, illumination uniformity and local contrast.

Equations for these metrics are described in detail in Appendix II. These equations were

used by employing Image J to provide required variable values. The equations for

measuring illumination uniformity and local contrast were developed in this research

while the other four measurements were developed by other researchers.

4.1.4 The Image Thresholding Shared Component

The task of image thresholding is to separate the foreground objects of interest from the

background in order to extract relevant information about the objects of interest (particles

in this work). The image thresholding shared component (Figure 4-5) included a

modified version (termed Modified MaxMin thresholding) of Torabi’s MaxMin

thresholding (described in Section 2.2.2.1). The search for the required threshold value

started with setting an initial threshold value and then thresholding the image. The

thresholded image was then measured for particle attributes using ImageJ. If the

measurement reached the thresholding objective, that is, if the maximum minimum

particle size of the image was found, the thresholded image was output; otherwise the

threshold was systematically set at another value as described in Appendix IV and the

selection repeated until the desired threshold was located.

The modified version of MaxMin thresholding proved to be much more computationally

efficient (Appendix IV).

57

Set Threshold Value

Output Thresholded Image

ThresholdingImageJ Thresholding

Image ThresholdingShared Component

Input Image

Object/Particle Measurement

Thresholding ObjectiveReached (?)

ImageJ Object/Particle MeasurementComponent

YES

NO

Figure 4-5 Image Thresholding Shared Component

Like the image processing shared component, image thresholding shared component is a

standalone component which was used in both off-line and in-line image quality

modification. This shared component accessed the built-in ImageJ thresholding method

which sets the image background to white. Therefore, it had an interface to ImageJ.

4.1.5 The Image Classification Shared Component

The Bayesian classification model adopted in the work was implemented by the image

classification component (Figure 4-6). The model used here is essentially the same as that

developed in Torabi’s research as described in Section 2.2.2.2. The Bayesian model was

made adaptive by using the method described as the Intelligent Learning Machine (ILM)

(Appendix V).

58

The image classification component was responsible for creating and adapting the

Bayesian classification model. It had three major functions:

i. To create a pre-optimization model using thresholded raw images as

illustrated in Figure 2-1. It also adapted the pre-optimization model to changes

in image quality by inserting relevant information from a newly acquired raw

image into the model using methods described in Appendix V.

ii. To create a post-optimization model using thresholded optimized images as

illustrated in Figure 5-1. It adapted the post-optimization model to changes in

image quality by adding relevant information from an optimized image into

the model using methods in Intelligent Learning Machine.

iii. To classify a thresholded image.

As illustrated in Figure 4-6, the image classification component invoked the image

thresholding component to extract classification attribute values from the image before

performing classification. It was frequently called to assess classification performance in

work regarding both objectives.

4.1.6 The Database Shared Component

As described in the third step of the strategy for achieving the first objective, the

Reference Image Database stored the results of off-line image modification. The cases

stored in the database were used in the in-line image modification phase as solutions to

processing newly acquired images through case-based reasoning. The database shared

component was therefore developed to manage the Reference Image Database. It played

a very important role in both in-line image quality modification and off-line image

59

modification. It consists of five major functionalities as illustrated in Figure 4-7, i.e.,

creating Reference Image Database, adding, retrieving and deleting image cases to the

database, and modifying database structure. For an example of the Reference Image

Database, please refer to Table 5-17 in Section 5.2.1.

Figure 4-6 Functionalities of Image Classification Shared Components

The functionalities of the database shared component were carried out through an Internet

connection. That meant that addition, deletion and retrieval of image cases to and from

the database were through a network connection. This required much trial-and-error

combined with testing so that it would function correctly.

The Reference Image Database included two kinds of data: similarity attributes which

were used to retrieve an image case, and image processing instructions which included IQ

operators and their parameter settings. Similarity attributes included five image quality

metrics, two classification parameters of mean density and percentage area of potential

particles. As illustrated later in Figure 5-1 in Section 5.1, image processing instructions

60

obtained in Simplex image optimization and similarity attributes formed an image case

which was added to the Reference Image Database.

The database was created using the Structured Query Language (SQL), an open relational

database standard. A MySql server (an open source software with extended

implementation of SQL) was installed on a WinXP machine. The database shared

component was implemented in Java. However, the query and modification of database

uses SQL. Hence, a Java database connection driver (JDBC) was required to connect the

SQL database and MySQL server. In this work, the JDBC driver was downloaded online

from the MySQL project (available at http://www.mysql.com/).

Figure 4-7 Reference Image Database Shared Components

61

4.2 Software Development for the Second Objective: In-line

Image Quality Modification

As described earlier, the objective of in-line image quality modification is to develop an

in-line method for modifying the quality of acquired images to permit improved

classification. The results of the first objective provide the required Reference Image

Database. The software components required to implement the strategies of achieving

this objective are shown in Figure 4-8. The left-hand side shows the strategy and the

right-hand side the software. Three major software components were necessary: the case-

based reasoning component, the image processing shared component and the image

classification shared component. The case-based reasoning component was developed to

locate the reference image in the Reference Image Database which most closely

resembled the newly acquired image. The associated image quality modification

instructions were then used to improve the quality of the newly acquired image by

employing the image processing component. The modified image was then classified by

the classification component.

The shared components mentioned were described above. The case-based reasoning

component is described below.

62

Figure 4-8 Computational Software Components Associated with In-line Image Quality Modification

The Case-based Reasoning Component

The case-based reasoning component (Figure 4-9) first measured the image quality

metrics and necessary attributes of particles of an image by invoking the image

measurement shared component. The measurement was then used by the database shared

component to locate the most similar image case in the Reference Image Database to the

current image case at hand. The IQ modification instructions associated with the retrieved

image case was then applied to process the current image using the image processing

shared component. The processed image was finally classified by the image classification

shared component. The classification result was compared to the class label previously

63

assigned by a human observer to determine whether it was correctly classified. If not, the

image was subjected to off-line image quality modification.

As can be seen, the computation involved in the case-based reasoning component of the

in-line image quality modification (the second objective) was not as complicated as that

in the Simplex optimization component which is the core component in off-line image

quality modification (the first objective). The reason is that it does not include the image

processing iterations and objective function evaluations which the Simplex optimization

component did.

64

Locating Reference Image Closely Resembling the

Input Image

Classification

Off-line Image QualityModification

Processing Input ImageUsing Image Quality Operators for located

Reference Image

Classification ResultCorrect (?)

NO

Image ProcessingShared Component

Image ClassificationShared Component

Case-based Reasoning Component

Measurement of Image Quality Metrics and Other

Attributes

Input Image

Image MeasurementShared Component

DatabaseShared Component

Modified ImageYES

Figure 4-9 Case-based Reasoning Component in In-line Image Quality Modification

65

5 RESULTS AND DISCUSSION

5.1 Off-line Image Modification

This section shows results pertaining to the first objective of the thesis. Thus, it focuses

upon the development of an off-line automated method for obtaining the needed

improvement in image quality.

The work reported in this section uses “known” images. By that it meant that the class of

each image used has been determined in advance by a human observer and, most

importantly, the software is informed of the observer’s decision. It will be seen that this

is important to allowing a measure of image quality to be assigned. Also, when images

are used to create (i.e. train) a classification model then “known” images must be used.

In the usual training method, known as “cross-validation”, 90% of the images are known

and 10% are used for a testing step (the software is unaware of the human observer’s

classification for that 10%). The cross-validation training method uses ten trials wherein

it sequentially takes different images to constitute the 90% known images. Finally, the

results from the ten trials are averaged. In this thesis, “unknown images” are those for

which we know the result of the human observer classification but the software does not.

Thus, in the above description of cross-validation, the 10% of the data reserved for

testing in each trial would be considered as “unknown images” for that particular trial.

Figure 5-1 shows an overview of the system developed to accomplish the first objective

of this thesis: determining how to modify each individual raw image to the image quality

66

required for improved classification results. Beginning with the introduction of a raw

“known images” a pre-set blanket image processing operation is first carried out:

brightness is adjusted to a value of 128 and the background is flattened. This

“standardization” step is the same as that carried out by Torabi [33]. The objective

function is now used to assign an image quality value to the preprocessed image of an

input raw image. As compared to the approach in Torabi’s research as illustrated in

Figure 2-1, there is a crucial customized image quality modification step (as shown in

dotted box in Figure 5-1) developed prior to the creation of the classification model or

evaluation of the classification model.

Next the image is optimized using the Nelder-Mead Simplex optimization in order to

determine how to apply IQ Operators to transform it from a raw image to an image that

will be more successfully classified by the classification model. This involves:

i. Selecting the IQ Operators using some or all of the screens described in

Section 5.1.1.

ii. Systematically “guessing” the values of parameters in these operators.

iii. Applying the operators to the raw image

iv. Using the defined objective function to obtain the image quality.

v. Returning to step ii above until the image quality has been maximized.

Thus, at the conclusion of this procedure, for each raw image there is also an optimized

image. As will be seen later in this section, the most important results of this operation

are:

a. For the raw image:

67

i. values of the usual image characteristics (noise, blur, contrast, illumination

uniformity and brightness)

ii. values of mean particle density and percentage area.

iii. the value of image quality for the raw image.

iv. the identity of the image quality operators along with their parameter

values used to transform the raw image quality to the optimized image

quality.

b. For the optimized image:

i. the value of image quality for the optimized image.

ii. values of mean particle density and percentage area.

From the optimized images, values of mean particle density and percentage area were

extracted to create a post-optimization model, which could be later used for evaluating

the model and classifying test images in in-line image modification (“unknown images”).

68

Customized Image Quality Modification

Preprocessed Image

Simplex OptimizationGuess IQ Modifiers Parameter Values

Objective FunctionEvaluation

Optimized Image

Selection of IQ Modifiers and Determination of Their Order of Application

Image ProcessingIn ImageJ

Optimization CriteriaReached (?)

Added to Reference Image DatabaseWith optimum IQ Modifiers

Brightness AdjustmentAnd Background Flattening

NO

YES

Extracting Features

Modelling:Post-Optimization Model

Input Training Image

Evaluation:Classification

Figure 5-1 Off-line Image Quality Modification Framework

69

5.1.1 Selection of Image Quality Operators and their Order of Application

Reducing the dimensionality of the image quality modification problem is the first step in

accomplishing Objective 1. The dimension of the problem is extremely high due to the

fact that the number of the possible IQ Operators is large, the number of the order of their

application is large and, additionally, each IQ Operator has its own parameters.

A screening approach for dimension reduction was developed and depicted in Figure 5-2.

A major advantage of this approach is its generality. That is, it provides flexibility by

selecting IQ Operators, first based upon primary image characteristics and then upon task

specific criteria.

A sequence of screens is used, beginning with Screen 1 and concluding with Screen 3.

Each of these is described in turn in the following paragraphs along with the results

obtained.

5.1.1.1 Screen 1: Constraining Selection of IQ Operators by Selecting the Image Analysis Software

Screen 1 is image analysis software selection. Selection of the image analysis software

immediately introduces the first constraint by defining a set of IQ Operators which are

readily available. However, in addition to providing a suitable variety of IQ Operators

the software must be able to interface with other software to be used in the work. It also

must be economical and preferably open source (so that the actual workings of the image

analysis methods can be examined).

70

Figure 5-2 Layered Screening of IQ Operators

A free, open-source image analysis program, Image J, was selected for the work. IQ

Operators were therefore limited to the ten present in that software and are listed in Table

5-1. Also shown in this table are the parameters associated with each operator and the

low and high levels of these parameters used later in the statistical experimental design as

described in Appendix VI.

Image J also contains some binary image operators, including erosion, dilation, open and

close for object shape analysis in a binary image. In this research, we dealt with grayscale

images; therefore these binary image operators were not used.

71

Table 5-1 Image Quality Operators

Identifier Method Purpose Parameters Low Level

High Level

BR Brightness linear shift

Change the brightness Bias shift -σ +σ

CON Contrast stretch

Adjust the contrast Contrast gain

0.5 2

MN Mean filter Blur the active image Filter radius 1 4 SHP Sharpen Remove motion

induced or out-of-focus blur yet accentuates noises

Not applicable

0 1

EQL Histogram equalization

Enhance image contrast Not applicable

0 1

MD Median filter

Remove noise Filter radius 1 4

GB Gaussian Blur

Smooth the image Filter radius 1 4

GDIL Minimum grayscale dilation Filter radius 1 4 GER Maximum grayscale erosion Filter radius 1 4

Filter radius 1 4 UNSHP Unsharp mask

Sharpen & enhance edges Mask

Weight 0.2 0.9

SUB Subtract Background

Correct non-uniform illumination

Filter radius 20 50

Notes: 1) For image operators with lower level of 0 and high level 1, 0 represents the absence of the operator while 1 means the presence of the operator. 2) For method of brightness linear shift, σ is the standard deviation of image grey values.

5.1.1.2 Screen 2: Selection of IQ Operators by Image Characteristics From the literature review, it was evident that the vast majority of IQ Operators are

directed at changing only five image characteristics. These characteristics are: noise,

blur, contrast, illumination uniformity and brightness.

72

Screen 2 consists of qualitatively determining which each of these characteristics is

present in the image. The published literature can then provide guidance as to whether or

not a particular IQ Operator should be selected.

Dimension reduction is achieved by filtering out some unnecessary operators based on

the examination of the characteristics of a given image. Image characteristics are used to

determine what image processing tasks are needed to improve image quality. Image

processing tasks can involve using a single IQ Operator or a combination of IQ

Operators. These tasks include noise removal, edge sharpening, blur removal, contrast

enhancement, brightness adjustment, illumination correction and others. If there is a very

low noise level in the image, the task of noise removal is unnecessary. If noise presence

is strong, an appropriate operator for noise removal must be chosen. The choice of

operators for noise removal depends on the noise characteristics. If the noise is impulsive

(salt-and-pepper), then a median filter is an appropriate choice. If the noise is Gaussian,

then a Gaussian filter or mean filter is suitable. An overall strategy for qualitative pre-

selection of IQ Operators is illustrated in Figure 5-3 to Figure 5-7. Each of these figures

shows a “decision tree” for selection of IQ Operators to remedy noise, blur, contrast,

illumination and brightness.

73

Figure 5-3 Pre-selection of Image Operators for Noise Removal

Figure 5-4 Pre-selection of Image Operators for Blur Removal

Figure 5-5 Pre-selection of Image Operators for Contrast Enhancement

74

As shown in these figures, qualitative image characteristic understanding leads to

selection of mutually exclusive IQ Operators. IQ Operators are mutually exclusive when

they are directed at the same image characteristic. The leaves of the decision tree in each

figure represent IQ Operators. If more than two leaves share an immediate parent node

then the choice of image quality operator is not deterministic. That is, then any leaf (IQ

Operator) under the same branch is suitable for a specific image processing task. For

example in Figure 5-4, for a blurred image, removing blur is required. However, both the

Figure 5-6 Pre-selection of Image Operators for Illumination Correction

Unsharp Mask and Sharpening operators are capable of performing this task. In this case,

the choice of IQ Operator is randomly selected.

Figure 5-7 Pre-selection of Image Operators for Brightness Adjustment

75

In applying the decision trees of Figure 5-3 to 5-7 the following guidelines must be taken

into account:

i. Noise removal operators should precede sharpness operators. This can be justified

due to the fact that noise removal often causes blurring and sharpness operators

have a very strong tendency to magnify noise.

ii. The sharpening operator (SHP) is preferred over the unsharp mask (UNSHP) for

applications where small features (particles) are of interest because UNSHP in

general magnifies noise more than SHP.

iii. Contrast operators including contrast stretching (CON) and histogram equalization

(EQL) magnify noise. Thus noise removal operators should precede the contrast

operators.

iv. EQL tends to create artifacts which are difficult to remove by other image operators

particularly for images where noises are at the same grey level range as features.

Thus contrast stretching is preferred over EQL in this situation.

With the conclusion of Screen 2, the number of IQ Operators is reduced sufficiently to

allow closer examination of the remaining operators using task specific criteria.

The next section shows two examples of how Screen 2 was applied to images obtained by

the Scanning Particle Monitor.

5.1.1.3 The Application of Screen 2 to Images Obtained by the Scanning Particle Monitor

Screen 2: Example 1

76

Figure 5-8 Example Real Image 1

Figure 5-8 is typical example of a particle image from Torabi’s data. To select IQ

operators, first we examine the image against five image characteristics, i.e., brightness,

contrast, noise, blur and illumination uniformity and apply the rules in the decision trees

of Figure 5-3 to Figure 5-7. In the image of Figure 5-8, there is random noise present

(small dots specified by the red arrow). Therefore a noise removal operator or mean filter

is necessary (Figure 5-3). In addition, the grey level of the area circled by the blue line is

quite different from the remaining image background, indicating the presence of

illumination non-uniformity. Therefore, in accord with Figure 5-6, an illumination

correcting operator such as background flattening is necessary. Furthermore, the particle

77

(circled in green) has a shiny center and also the image is quite bright, both of which

could be corrected by brightness adjustment (Figure 5-7). According to the guidelines

specified above, in terms of the order of application of IQ operators, brightness shift,

background flattening followed by mean filter would be the appropriate IQ operator

sequence to process the image.

Screen 2: Example 2

Figure 5-9 Example Real Image 2

The image of Figure 5-9 shows noise (specified by the red arrows) and blur around

particles (specified by the blue circles). Therefore, it is necessary to process the image

with the noise removal operator and a blur removal operator, i.e., mean filter and Sharpen

78

operator (Figure 5-3 and 5-4). In addition, the particles in the image have shining centers

and the image is too bright, both of which could be corrected by brightness adjustment to

make the image darker (Figure 5-7). Thus, in accord with the guidelines for Screen 2, this

image is to be processed by a brightness shift, mean filter and Sharpen operators in that

order.

5.1.1.4 Screen 3: Dimensionality Reduction: Selection of IQ Operators by Task Specific Criteria

In Screen 2, the IQ operators to improve IQ of an image are selected based on a set of

decision rules against five image quality characteristics. However, for computational

speed it is very desirable to further reduce the number of selected IQ operators if

possible. In this research, particle images are of the interest. The goal is to identify

particles in an image. Therefore, if the IQ operator has little or no effect on this goal then

it is reasonable to omit it. Initially a method involving test patterns and statistical

experimental designs was developed to systematically examine images. The method does

include some significant and novel contributions and so is described in Appendix VI.

However, in this work it served as a learning experience. From working to develop the

method a set of rules evolved to guide dimension reduction that provided a practical,

rapid solution. These rules are as follows:

i. If the noise present in the image is observed to be very small compared to the

particle image and has a much lower contrast to the background compared to

particle image, then the noise removal operator is unnecessary and could be

eliminated.

79

ii. An IQ Operator should not be eliminated if it is required by another operator.

That is, interaction effects amongst operators can be important. For example, in

the case of a noise removal operator such as a mean filter followed by blur

removal operator such as an Unsharp Mask, if the mean filter is eliminated, then

the Unsharp Mask will amplify the noise so much that the noise can be

misidentified as a particle. Therefore the mean filter should not be eliminated in

this situation.

To show how Screen 3 is applied we can return to the examples described for Screen 2.

For the image in Figure 5-8, in screen 2, it was decided that a sequence of background

flattening followed by mean filter be chosen. However, according to the decision rules

specified in Screen 3, the mean filter can be eliminated. That is because the size of the

particle present in the image is much greater than the noise and the grey level of the

particle is much lower than that of the noise (the noise grey level is very close to the

background grey level). Thus, the presence of noise would be unlikely to affect the

identification of the particle. By eliminating the mean filter, the sequence of IQ operators

could be abbreviated to brightness shift followed by background flattening.

In the second example, the image of Figure 5-9 as concluded for Screen 2, the image was

to be subjected to the image processing sequence of brightness shift, mean filter and

Sharpen operators. According to the second rule for Screen 3, this sequence cannot be

shortened because the elimination of mean filter will cause noise amplified by Sharpen

operators, therefore the sequence determined in Screen 2 will remain after the application

of Screen 3.

80

Each of the IQ operators selected by the screening process has variable parameters. The

value of the parameters is to be determined by optimizing image quality. However, first

a definition for image quality is necessary.

5.1.2 Image Quality Definition Once the IQ Operators and their order of application were identified according to the

method described above in Section 5.1.1, the next question was how best to use them for

each image in order to improve image quality for improved classification accuracy. A

method for determining the optimum parameter settings for each IQ Operator to achieve

image quality improvement was required. Therefore, a quantitative definition of image

quality was needed. The requirement actually was to characterize image quality with a

single number that would change as IQ Operators are applied to the image and that would

indicate how accurately the image can be classified as being WO (without particle) or

WP (with particle).

The four definitions of image quality examined in this work are now described in turn.

Following these descriptions, the improvement in classification results attained by using

each of them is compared.

The same set of images was used to evaluate each of the four definitions. These images

were randomly selected from images obtained in several different runs of in-line images

monitoring by Torabi. This set of images is a subset of Image Set 1 as listed in Table

3-1. In total, there are 745 images, of which 240 are “Without Particle (WO), and 505

“With Particle” (WP). These images are termed training images since they are labeled by

81

the human observer as WO or WP and the software is “told” of this classification. These

training images are used to create classification models.

5.1.2.1 Least Squares as Objective Function Image quality (IQ) can be summarized in a single number by using the same type of

“objective functions” as are minimized in non-linear regression. Here a weighted least

squares objective function compares the IQ Metric values of the image (Qi) with the

desired IQ Metric values (Qi,d):

∑=

−=5

1

2, )(

idii QQIQ 5-1

where Q1=noise, Q2= blur, Q3=contrast, Q4=illumination uniformity and Q5 = brightness).

The Qi values are measured from the image being considered. Calculation of the Qi

values is detailed in Appendix I. These quantities were all normalized so as to range

within [0,1] inclusive, with zero being the worst and unity being the best for each

individual metric. Qi,d is the value of Qi desired for the image, and the Qi,d values by

definition are unity for all five IQ metrics.

In this subsection, the application of Least Squares (LS) as objective function and its

impact on classification is discussed in details. Furthermore, its relationship with

classification results will be examined to determine if there is a positive correlation

between Lease Squares image quality and classification.

Classification Results Using Least Squares as Objective Function

To provide “baseline values” for classifier performance in this section of the thesis, raw

(not optimized) images were classified using the Torabi Bayesian model. Actually, the

82

model used has a modified thresholding method described in Appendix IV. The

confusion matrix for the classification is shown in Table 5-2, where of the 505 images

that actually contained particles (were in class WP), 489 of them were correctly predicted

(true positives) and 16 of them were incorrectly predicted as having no particles (false

negatives). For the evaluation of classification methods and all related terminology,

please refer to Section 2.5 in the literature review. Of the 240 images that did actually

not contain any particles, 25 of them were incorrectly predicted as having particles (false

positives) and 215 of them were correctly predicted as having no particles. This number

of false positives is 10.4% of the number of images that actually have no particles (i.e. a

false positive rate of 10.4%) and would be responsible for many false alarms in a

classification operation. The number of false negatives is 3.2% of the total number of

images that actually do have a particle and could allow the production of considerable

off-spec product. The overall classification accuracy calculated based on Table 5-2 is a

respectable 94.5 %. However, this can also be viewed as a classification error of 5.5% as

shown in Figure 5-10. That is, if no customized image processing is done then about one

in every twenty images is misclassified.

Table 5-3 shows the classification confusion matrix of images optimized using the

unweighted least squares (LS) objective function and Figure 5-10 compares the

classification error rate, false positive rate and false negative rate with the classification

of the raw images mentioned above. Error rates are compared using bar charts as in

Figure 5-10. The 95% confidence interval for each error rate is marked at the top of each

bar. Appendix IX shows an example calculation for the 95% confidence interval. The

data labels on the top of each bar are the calculated error rate from confusion matrix.

83

As shown in Figure 5-10, the classification of raw images yields an overall 94.5%

accuracy while that of Least Squares optimized images yields an overall 98.2% accuracy.

3.7% rise in accuracy between raw image and Least Squares optimized images is

considered a significant improvement. The false positive rate 10.4% for the model based

on raw images is unacceptable. In contrast, the model based on Least Squares optimized

image is able to achieve a balanced true positive rate (98.6%) and false positive rate

(2.5% or true negative rate of 97.5%).

Table 5-2 Classification Confusion Matrix for the Training Set of Raw Images

Table 5-3 Classification Confusion Matrix for the Training Set of Images Optimized Using the Least

Squares Objective Function

Predicted Class is WP Predicted Class is WO Actual Class is WP 489 16 Actual Class is WO 25 215

Predicted Class is WP Predicted Class is WO Actual Class is WP 498 7 Actual Class is WO 6 234

84

10.4%

3.2%

5.5%

2.5%1.8% 1.4%

0%

2%

4%

6%

8%

10%

12%

14%

Overall Error Rate False Negative Rate False Positive Rate

Perc

enta

ge (%

)

Torabi Bayesian Model Least Squares

Figure 5-10 Classification Error Rate for Least Squares as Objective Function Comparison of the confusion matrices for classification of the raw images and

classification of the Least Squares optimized images shows a large drop in both false

negatives (16 to 7 images) and false positives (25 to 6). Classification error rate

decreased from 5.5% to 1.8%. Thus, when Least Squares optimization was done

previous to classification only 2 images in 100 were misclassified rather than 5.5.

Figure 5-10 shows these improvements. False positive rate drops from 10.4% to 2.5%

and false negative rate from 3.2% to 1.4%.

As described earlier in Section 2.5, the receiver operating characteristic curve (ROC)

provides yet another measure of classification performance. It’s advantage is that it

shows how the classification results vary as the threshold probability used to assign the

WP class to an image is varied from zero to unity.

85

In Figure 5-11 there are two ROC curves for Least Squares and model of raw images,

respectively (note that parts of ROC curves for the model of raw images are

superimposed by the ROC curve for the Least Squares optimized images). The

classification accuracies for raw and Least Squares optimized images listed in Figure

5-10 are two data points on the corresponding ROC curve (as shown in Figure 5-11).

The ROC curve for the Least Squares optimized images is far superior to that for the raw

images because it is always to the left and above the ROC curve for classification of raw

images.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1False Positive Rate

True

Pos

itive

Rat

e

Bayesian Model of Raw ImagesBayesian Model of Least Squares Optimized Images

Maximum Accuracyfor Bayesian Model of Raw Images

Maximum Accuracyfor Bayesian Model of Lease SquaresOptimized Images

12

Figure 5-11 ROC Curve for Least Squares as Objective Function

86

As mentioned previously, the Area Under a ROC curve (AUC) provides a useful measure

of the general superiority of one classifier over another because it includes consideration

of more than the overall classification error rate. The AUCs are 0.974 and 0.998 for

classifiers of raw images and Least Squares optimized images. As expected, from the

appearance of the curves in Figure 5-11, the AUC for classification of images optimized

using a Least Squares objective function is significantly larger than that of the raw

images.

The training image set used here have a skewed class distribution of 1:2 (WO: WP). In

this situation, ROC analysis suggests that the optimal operating point on the ROC curve

be the one at which the tangent of the curve is 1/2.

Relationship between Least Squares as Objective Function and Classification Results These results were very promising. However, since even better results were desired, an

effort was initiated to determine how best to improve the least squares objective function

definition of image quality. Table 5-4 shows the average Least Squares image quality

values for the raw images and for the Least Squares optimized images (correctly and

then, incorrectly classified images) along with their respective 95% confidence intervals.

As expected, the average image quality of the Least Squares optimized images is

significantly greater than the raw images (0.616 versus 0.335). However, surprisingly,

there was no significant difference between the average image quality of the correctly

and incorrectly classified Least Squares optimized images (0.617 and 0.584 respectively).

To further assess the IQ definition, the IQ values of each individual image was examined

(rather than only the average values). Figure 5-12 is a plot of Image Quality (i.e. the

87

value of the Least Squares objective function of the optimized image---the input to the

Bayesian classifier) versus Image number. The images were numbered from 1 to 745 and

this number was plotted on the abscissa with the misclassified images purposefully being

assigned the highest numbers to allow them to be grouped on the plot. The IQ values for

the misclassified images color-coded in red are indistinguishable from the majority of

correctly classified images color-coded in blue.

Table 5-4 Relationship between Least Squares as Objection Function and Classification Accuracy

Table 5-5 and the accompanying histogram, Figure 5-13, show how the Least Squares

image quality values of the optimized images are distributed across the correctly

classified and misclassified images. These data reveals that 695 correctly classified

images (93.3% of the total images) have a value of Least Squares above 0.55. However,

10 out of 13 misclassified images, accounting for 1.3% of total images used, also have

values above 0.55. Some of the misclassification could be due to the fact that a

classification model, like any model fit to experimental data, is not expected to be a

perfect predictor. However, it was the contribution to the misclassification errors by the

inadequacy of the Least Squares objective function as a definition of image quality that

Number of

Images

Raw Images: Average

IQ (95%

confidence level)

Optimized Images : Average

IQ (95%

confidence level )

Number of Misclassified

Optimized Images

Average Image

Quality of Correctly Classified Optimize Images (95%

confidence level )

Average Image

Quality of Misclassified

Optimized Images (95%

confidence level)

745 0.335 ±0.0035

0.616 ±0.0027

13 0.617 ±0.0027

0.584 ±0.022

88

was of interest. This could be due to the function not including the best image quality

measures or to some aspect of the mathematical form of the equation. The next step was

to try using weighting factors in the least squares objective function to see if that would

improve results.

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0 100 200 300 400 500 600 700 800

Image Case

LS Im

age

Qua

lity

Correctly Classified Images Misclassified Images

Figure 5-12 Image Quality Distribution for Least Squares Optimized Images

Table 5-5 Image Quality Distribution for Least Squares Optimized Images

Image Quality Correctly Classified Images Misclassified Image 0.45~0.50 4 0 0.50~0.55 32 3 0.55~0.60 185 5 0.6~0.65 353 5 0.65-0.70 157 0 0.70~0.75 1 0

89

32

185

353

157

4 1

0 0

0

5

5

3

0

50

100

150

200

250

300

350

400

0.45-0.5 0.5-0.55 0.55-0.6 0.6-0.65 0.65-0.7 0.7-0.75

LS Image Quality

Num

ber o

f Im

ages

Correctly Classified Images Misclassified Images

Figure 5-13 Image Quality Histogram for Least Squares Optimized Images

5.1.2.2 Weighted Least Squares (WLS) as Objective Function With the introduction of weighting factors the least squares objective function (Equation

5.1) becomes a weighted least squares objective function:

∑=

−=5

1

2, )(

idiii QQwIQ 5-2

where Q1=noise, Q2= blur, Q3=contrast, Q4=illumination uniformity and Q5 = brightness)

and wi are “weighting factors” that serve to provide the relative importance of a

particular metric difference to the IQ value.

The weighting factors were obtained from:

ri

LSirii Q

QQw

,

,, || −= 5-3

90

where Qi,r is ith IQ Metric of the raw image, and Qi,LS is the ith IQ Metric of the image

obtained after IQ Operators were used to minimize the least squares objective function.

The hypothesis underlying this approach was that the IQ Metrics which changed the most

in the least squares optimization were most likely those that were most important to

improvement of the image quality. Therefore, they deserved higher weighting. Since only

the relative values of the wi in Equation 5-3 are important, like the Qi, they were

normalized to range from zero to unity inclusive. Values of the wi are shown in Table

5-6.

Table 5-6 Weight Factors for Weighted Least Squares Image Quality Definition

Raw

Image Simple Least Squares

Optimized Image Normalized Weighting Factor Used

in Weighted Least Squares Objective Function

Brightness 0.42 0.63 0.0017 Contrast 0.21 0.064 0.0055 Blur 0.60 0.70 0.0013 Noise 0.99 0.67 0.0025 Illumination Uniformity

0.0056 0.70 0.99

The column of Table 5-6 labeled “Raw Image” is the averaged and normalized IQ Metric

values for 745 randomly selected training (“known”) images. These IQ Metric values are

normalized to range between zero and unity, again with zero being the worst and unity

being the best quality for each individual metric. The next column (labeled “Simple

Least Squares Optimized Image”) shows the same averaged quantities for the images

after IQ Operators were used to minimize the unweighted least squares objective

function (Equation 5-1). The final column shows the values of the normalized weighting

factors obtained using Equation 5-3 with the IQ Metric values. Raw images have very

poor illumination uniformity (0.0056), and a relatively low contrast of 0.21. However,

91

their noise level is low with a value of 0.99 (For noise, blur and illumination uniformity

metrics, a larger value means less noise, less blur and high illumination uniformity and

vice versa). With a weighting of 0.99, Illumination Uniformity was of overwhelming

importance to image quality compared to the other image characteristics. Contrast and

noise of the images optimized using simple least squares surprisingly became worse than

those of the raw image.

Classification Results Using Weighted Least Squares as the Objective Function Table 5-7 shows the confusion matrix for classification of the images using the weighted

least squares objective function as the definition of image quality. Figure 5-14 shows the

classification errors in comparison to the classification of raw images. In comparison

with the results using raw images (Table 5-2) there was a significant improvement as

shown in Figure 5-14. However, in comparison with the unweighted least squares

objective function (Table 5-3 and Figure 5-10) there is no significant improvement.

Table 5-7 Confusion Matrix for Weighted Least Squares Optimized Images

Predicted Class is WP Predicted Class is WO Actual Class is WP 499 6 Actual Class is WO 8 232

92

10.4%

5.5%

3.2% 3.3%

1.2%1.9%

0%

2%

4%

6%

8%

10%

12%

14%

Overall Error Rate False Negative Rate False Positive Rate

Perc

enta

ge (%

)

Torabi Bayesian Model Weighted Least Squares

Figure 5-14 Classification Error Rates for Weighted Least Squares Optimized Images Figure 5-15 shows the ROC curves for classification of images optimized using the

Weighted Least Squares and for classification of raw images, respectively (note that the

two curves superimpose in some regions). The classifier accuracies are shown as two data

points in the respective ROC curves. The ROC curves in Figure 5-15 are similar to those

of Figure 5-11 and show the superiority of classification of the optimized images over the

classification of raw images. They also show little difference between Least Squares and

Weighted Least Squares image quality definitions. The area under these curves delivers

the same message (Table 5-8).

93

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1False Positive Rate

True

Pos

itive

Rat

e

Bayesian Model of Raw ImagesBayesian Model of Weighted Least Squares Optimized Images

Maximum Accuracyfor Bayesian Model of Raw Images

Maximum Accuracyfor Bayesian Model of Weighted LeastSquares Optimized Image

Figure 5-15 ROC Curve for Weighted Least Squares Optimized Images

Table 5-8 Comparison of AUC for Least Squares and Weighted Least Squares Optimized Images

Objection Function Area Under the

ROC Curve (AUC)

No objective Function (Bayesian Model of Raw Images)

0.974

Least Squares 0.998 Weighted Least Squares 0.991

Relationship between Weighted Least Squares as Objective Function and Classification Results To further examine the relationship between the WLS image quality definition and

classification accuracy the Weighted Least Squares data was treated in the same way as

94

the Least Squares data. Table 5-9 shows the average values of Weighted Least Squares

image quality and Figure 5-16 shows a plot of image quality versus image number. The

overall classification error rate as shown in Figure 5-14 was very similar to that obtained

as shown in Figure 5-10 for the simple least squares definition of image quality. Thus,

the Weighted Least Squares image quality definition did not really provide an

improvement in classification performance over unweighted least squares.

Table 5-9 Relationship between Weighted Least Squares as Objection Function and Classification

Accuracy

Number of

Images

Raw Images: Average IQ( 95%

confidence Level)

Optimized Images:

Average IQ (95%

confidence level)

Number of Misclassified Optimized

Images

Average Image

Quality of Correctly Classified Optimize Images (95%

confidence level)

Average Image

Quality of Misclassified

Optimized Images (95%

confidence level)

745 0.335 ±0.0035

0.766 ±0.0073

14 0.765 ±0.0074

0.824 ±0.027

95

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600 700 800

Image Case

WLS

Imag

e Q

ualit

y

Correctly Classified Images Misclassified Images

Figure 5-16 Image Quality Distribution for Weighted Least Squares Optimized Images

Table 5-10 Image Quality Distribution for Weight Least Squares Optimized Images

Image Quality Correctly Classified Images Misclassified Image 0.2~0.25 1 0 0.25~0.30 1 0 0.30~0.35 1 0 0.35~0.40 2 0 0.40~0.45 9 0 0.45~0.50 7 0 0.50~0.55 15 0 0.55~0.60 27 0 0.6~0.65 22 0 0.65-0.70 43 0 0.70~0.75 105 0 0.75~0.8 167 1 0.8~0.85 193 9 0.85~0.9 139 3

96

0

50

100

150

200

250

0.2-0.

25

0.25-0

.3

0.3-0.

35

0.35-0

.4

0.4-0.

45

0.45-0

.5

0.5-0.

55

0.55-0

.6

0.60-0

.65

0.65-0

.7

0.7-0.

75

0.75-0

.8

0.8-0.

85

0.85-0

.9

WLS Image Quality

Num

ber o

f Im

ages

Correctly Classified Misclassified

Figure 5-17 Image Quality Histogram for Weighted Least Squares Optimized Images Thus, at this point it was evident that using a simple or a weighted least squares objective

function to define image quality provided improvement in image classification over

classification of raw images. However, to improve classification results still more it was

thought that perhaps a better objective function would be the one that prevented any

individual IQ Metric value from being forced into extremely low values. That led to

consideration of the “desirability function”.

5.1.2.3 The Desirability Function as Objective Function The desirability function is defined as:

nQQQQQIQ /154321 )( ××××= 5-4

where Q1, Q2, Q3, Q4, Q5 are five IQ Metrics defined, parameter n is a constant power. In

this research, n is set to 1. Therefore, the definition becomes the product of five IQ

97

Metrics. Now the value of IQ is influenced by all of the metrics so no one metric will be

disproportionately decreased.

Classification Results of Using Desirability Function as Objective Function Table 5-11 shows the confusion matrix for the classification of images that were

optimized using the desirability function. Figure 5-18 shows a comparison of

classification errors with the classification of raw images. Figure 5-19 shows ROC

curves of the classification of raw images and the classification of images optimized

using the desirability function. Results are not significantly different than those obtained

with both the Least Squares and Weighted Least Squares objective functions as image

quality definitions.

Table 5-11 Confusion Matrix for Desirability Function Optimized Images

Predicted Class is WP Predicted Class is WO Actual Class is WP 499 6 Actual Class is WO 7 233

98

10.4%

5.5%

3.2% 2.9%

1.2%1.8%

0%

2%

4%

6%

8%

10%

12%

14%

Overall Error Rate False Negative Rate False Positive Rate

Perc

enta

ge (%

)

Torabi Bayesian Model Desirability Function

Figure 5-18 Classification Error Rate for Desirability Function Optimized Images

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1False Positive Rate

True

Pos

itive

Rat

e

Bayesian Model of Raw ImagesBayesian Model of Desirability Function Optimized Images

Figure 5-19 ROC Curve for Desirability Function Optimized Images

99

Relationship between Desirability Function as Objective Function and Classification Results

00.005

0.010.015

0.020.025

0.030.035

0.040.045

0.05

0 200 400 600 800

Image Cases

IQ o

f Des

irabi

lity

Func

tion

Correctly Classified Images Misclassified Images

Figure 5-20 Image Quality Distribution of Desirability Function Optimized Images

Table 5-12 Image Quality Distribution of Desirability Function Optimized Images

Image Quality Correctly Classified Images Misclassified Image 0-0.005 1 1 0.005-0.01 217 7 0.01-0.015 188 5 0.015-0.020 147 0 0.020-0.025 91 0 0.025-0.030 36 0 0.03-0.035 22 0 0.035-0.040 23 0 0.040-0.045 3 0 0.045-0.050 3 0 0.050-0.055 1 0

100

0

50

100

150

200

250

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

IQ of Desirability Function

Num

ber o

f Im

ages

Correctly Classified Images Misclassified Images

Figure 5-21 Image Quality Histogram for Desirability Function Optimized Images

Experience with the weighted least squares objective function and the desirability

function led to the hypothesis that to obtain further image quality improvement for

classification the definition of image quality had to be more directly linked with the

classification model. This was done by using the classification model itself to formulate

a definition for image quality and is described in the next section.

5.1.2.4 Probability Density Difference as Objective Function Classification of the images into WO and WP in this work is carried out using a Bayesian

classification model (explained in Section 2.2.2.2 in the thesis). The model was

previously developed by Torabi. In Section 2.2.2.2 it was mentioned that the input to this

model is percentage area of the particles and mean density of the particles. In contrast,

the image quality functions described above use image characteristics more commonly

101

found in the image processing literature (noise, blur, contrast, illumination uniformity and

brightness). Thus, one way of improving the link between the image quality definition

and the classification accuracy would be to replace the literature image quality

characteristics with those actually used for classification in Equation 2-8. However, the

uncertainties associated with how best to assign the values of weighting factors and even

the exact form of the objective function would then still remain. Instead, an image

quality definition that was based upon using the same type of classification model to

obtain a measure of image quality was devised. That meant dealing with two

classification models. One (termed the “pre-optimization model”) is used to characterize

image quality of images that have previously been assigned to a WO or WP class by a

human observer (i.e. it is used only to assign an image quality number to a “known”

image). This model is created the same way as in Torabi’s research (Figure 2-1). The

second, and formerly the only model used, was termed the “post-optimization model” and

was used to classify “unknown” images as WO or WP. Its creation is illustrated in Figure

5-1.

As described earlier (Section 2.2.2.2), in conventional data mining practice the Bayesian

model utilizes the calculation of probability densities for the attribute values. More

specifically, the value of f(X=x|C=WP)P(C=WP) is compared to the value of

f(X=x|C=WO)P(C=WO) for an image and the image is classified as WP or WO

depending upon whether the former or latter quantity is the greater. Probability densities

instead of probabilities are used because exactly the same classification results would be

obtained if the actual probabilities (P(X=x|C=WP) and P(X=x|C=WO) were used in place

of probability densities. It has been well documented [111, 112] that, although this

102

approach is implemented using the Naïve Bayesian equation (Equation 2-3), and

therefore assumes that attributes are statistically independent, classification results are

often very satisfactory even when this assumption is violated. That is, the classification

tolerates inaccuracies in the probability densities very well. Furthermore, it was found

[111] that the greater the difference between values of f(X=x|C=WP)P(C=WP) and

f(X=x|C=WO)P(C=WO) the less likely that random error associated with sample size

would adversely affect the accuracy of the classification. Thus, considering these

aspects, image quality (IQ) was defined as follows:

WOisimageifWPCPWPCxXfWOCPWOCxXfIQWPisimageifWOCPWOCxXfWPCPWPCxXfIQ

)()|()()|()()|()()|(

===−=======−====

5-5A,5-5B Since Eqns. (5-5A) and (5-5B) are applied only to training images, as mentioned above, it

is already known whether the image has been assigned as WP or WO by an independent

observer and the appropriate equation (5-5A or 5-5B) can be selected.

Note that f(X=x|C=WP) and f(X=x|C=WO) are the probability densities of the attributes

given the class and do not add to unity. IQ is computed from the pre-optimization

Bayesian model using the attribute values of the raw image. Then the Simplex search is

used to maximize IQ by systematically varying the adjustable parameters of the IQ

operators (e.g. brightness shift, noise filters, unsharp mask and background flattening,

etc.). Since the prior probabilities (P(C=WO) and P(C=WP)) are constants for a particular

training data set it is really the probability densities that are “optimized” by the search.

As can be seen from the above description, this approach utilizes the same calculated

quantities as are used in Naïve Bayesian classification and obtains the parameter values

103

in the IQ Modifiers that will transform the raw image into one that has the largest

attainable difference in the two critical classification quantities (i.e. the largest IQ value).

An alternate approach would be to deviate from normal practice and to calculate the

actual Naïve Bayesian posterior probabilities (i.e. P(C=WP|X=x) and P(C=WO|X=x))

This could readily be done by calculating the denominator, P(X=x), in Eqns. 2-4 and 2-5.

As mentioned earlier, P(X=x) is the probability of the specific attribute values occurring.

It is conventionally calculated as a normalizing constant (i.e. as the value which would

cause P(C=WP|X=x) and P(C=WO|X=x) to add to unity as they should because they are

all-possible, mutually exclusive probabilities). Since these quantities are thus normalized

(i.e. add to unity), image quality could then be defined as one or the other of them.

Maximizing one of them would automatically minimize the other. Alternatively, the

difference (e.g. P(C=WP|X=x)-P(C=WO|X=x)) could be used as IQ. These alternatives

were not tested in this work but in both cases and in the definition of IQ used (Eqns. 5-5),

the maximum difference between the quantities used to effect the classification would be

found by the Simplex search. Since we know that the relative value of these quantities is

unaffected by converting probability density to probability, then we would expect the

same optimized image to result from application of the Simplex search and the same

optimized parameter values for the IQ modifiers to be obtained.

Classification Results Using Probability Density Difference The parameters in both the pre-optimization model and the post-optimization model were

determined by first training each model with images which have been previously

classified by a human observer and randomly selected from the sample of such images

104

available. The creation of the pre-optimization model and post-optimization model is

illustrated in Figure 2-1 and Figure 5-1, respectively. The parameter values for each

model are shown in Table 5-13. (Note: these parameters should not be confused with the

parameters mentioned above for the IQ Modifiers. The latter are parameters found by the

Simplex search and are adjustable “constants” in IQ Modifiers that change image quality.

The parameters discussed in this section are constants in both the pre-optimization and

post-optimization Bayesian models that need to be set before the model can be used for

classification.) Each Bayesian model contains five parameters: prior probability, mean

density of particles, the standard deviation of mean density, percentage area of particles,

the standard deviation of percentage area. Prior probability is the relative frequency

(percentage) of the images in the training image set belonging to either WO or WP. Mean

density and percentage area are measurement obtained after thresholding the image using

a slightly modified version of Torabi’s MaxMin Thresholding method (Please refer to

Appendix IV for details). Mean density is the average grey value of a feature suspected to

be a particle. Percentage area is the ratio of particle size to the whole image size. Details

on how the quantities of Table 5-13 are calculated are shown in Appendix V. The values

of mean density and percentage area are averages of all particles from all images in the

training image set. For the pre-optimization model, the standard deviations of both mean

density and percentage area are greater than those for the post-optimization model. The

mean density and percentage area values are not significantly different in the two models.

105

Table 5-13 Parameter Values in Classification Models

Model Parameter

Pre-Optimization Model

Post-Optimization Model

WO WP WO WP PP 0.322 0.678 0.322 0.678 MD 155.3 63.5 155.4 64.5

STDEV-MD 19.0 28.0 5.6 9.8 PA 0.000113 0.000116 0.000114 0.000116

STDEV-PA 0.000126 0.000110 0.000154 0.000109 Legend: PP – prior probability of an image being without particle or with particle

MD – mean density of particles STDEV-MD – standard deviation of mean density PA – percentage area of particles STDEV-PA – standard deviation of percentage area The law serving as the basis for the Naïve Bayesian models contains two assumptions:

the parameters for classification follow a Gaussian distribution and they are statistically

independent. Although because the Bayesian model is being used for classification (or

for probability density difference) the need to satisfy these assumptions is less than if

absolute probability values were of interest. Normally, in data mining, Naïve Bayesian

models are used if they provide acceptable classifications and there is little concern about

these two assumptions. That said however, it was expected that the Naïve Bayesian

model had an excellent chance of working because the assumptions were verified in

Torabi’s research for the same type of images (and mostly for the same actual images) as

were used in this work.

Classification Results Using Probability Density Difference as Objective Function Simplex as an optimization algorithm, like all other numerical optimization methods for

problems non-linear in the parameters, can locate a false “local optimum” instead of the

desired “global optimum”. To assess the extent of this problem here using the

106

“probability density difference” definition of image quality as objective function, the

parameters were systematically varied over a wide range and the value of the objective

function evaluated. Mapping the search area in this way revealed that the initialization of

the Simplex has a strong effect on the optimum point it reached. A single random

initialization would miss the global optimum of 457 times out of 745 images. However,

much better results were achieved when 4 random discrete initializations were applied:

for only 9% of 745 images, their “local optimum” obtained missed the “global optimum”

And all were very close to the global value with less than 2% difference. The

initialization achieving the best image quality was adopted and the resulting optimized

image was then used to extract classification attributes. All of the data in the study was

re-calculated using the improved method of applying the Simplex.

The use of the probability density difference as an objective function provided the

required breakthrough in the definition of image quality. Classification results are shown

in Table 5-14 and Figure 5-22. We see that, for WO images no images are incorrectly

classified (a false negative rate of zero). While for WP images, only 1 (0.2%) out of 505

are incorrectly classified, which corresponds to a false positive rate of 0% in Figure 5-22.

The overall classification error rate is 0.1%. The ROC curve shows that the classification

is almost perfect with AUC value of unity, the largest value a classifier can attain.

Table 5-14 Confusion Matrix for Training Image Set Using Probability Density Difference as

Objective Function

Predicted Class is WP Predicted Class is WO Actual Class is WP 504 1

Actual Class is WO 0 240

107

10.4%

3.2%

5.5%

0.0%0.1% 0.2%0%

2%

4%

6%

8%

10%

12%

14%

Overall Error Rate False Negative Rate False Positive Rate

Perc

enta

ge (%

)

Torabi Bayesian Model Probability Density Difference

Figure 5-22 Classification Error Rate for Probability Density Difference Optimized Images Relationship between Probability Density Difference as Objection Function and

Classification Results

Unlike the previous IQ definitions examined, use of the probability density difference

objective function as an image quality definition automatically provided a perfect

correlation between the image quality definition and the classification performance

because the classifier uses the difference in probabilities in order to assign the class to an

image. Consequently, there will be no IQ value overlap between correctly classified

images and misclassified images.

108

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1False Positive Rate

True

Pos

itive

Rat

e

Bayesian Model of Raw ImagesBayesian Model of Probability Density Difference Optimized Images

Figure 5-23 ROC Curve for Probability Density Difference Optimized Images

5.1.3 Comparison of Classification Results for Different Objective Functions

Figure 5-24~27 and Table 5-15 show the same data as presented in the previous sections

but arranged so allow the direct comparison of classification results for all of the various

definitions of image quality. Figure 5-24 shows the classification error rate, Figure 5-25

the false negative rate, Figure 5-26 the false positive rate and Figure 5-27 the ROC

curves. In every case the use of the optimized images is superior to the use of raw

images for classification and the probability density difference objective function

provided by far the best definition of image quality for image quality improvement.

The five image quality metrics and local contrast of potential particles of the training

image set are listed in Table 5-16. The measurements are based on the blanket processed

109

images after the raw images are processed with standard brightness adjustment and

background flattening as illustrated in Figure 5-1. The reason for that is all raw images

are subject to these two standard operations. Therefore the baseline image quality should

not be based on the raw image rather on the blanket processed image of the raw image.

The reason for two standard operations is because the images acquired from the scanning

particle monitor all have non-uniform background because of lighting and the brightness

is also varied because the back light intensity is changed from run to run. Therefore a

blanket operation is necessary to bring all images to the same brightness level and correct

the background non-uniformity.

0.1%

5.5%

1.8% 1.9% 1.8%

0.0%

1.0%2.0%

3.0%

4.0%5.0%

6.0%

7.0%

TorabiBayesian

Model

LeastSquares

WeightedLeast

Squares

DesirabilityFunction

ProbabilityDensity

DifferenceObjective Function

Cla

ssifi

catio

n Er

ror R

ate

(%)

Figure 5-24 Comparison of Classification Error Rates among Different Objective Functions

110

3.2%

1.4% 1.2% 1.2%

0.2%0.0%0.5%1.0%1.5%2.0%2.5%3.0%3.5%4.0%4.5%

TorabiBayesian

Model

LeastSquares

WeightedLeast

Squares

DesirabilityFunction

ProbabilityDensity

Difference

Objective Function

Fals

e N

egat

ive

Rat

e

Figure 5-25 Comparison of False Negative Rates among Different Objective Functions

0.0%

2.9%3.3%2.5%

10.4%

0%

2%

4%

6%

8%

10%

12%

14%

TorabiBayesian

Model

LeastSquares

WeightedLeast

Squares

DesirabilityFunction

ProbabilityDensity

Difference

Objective Function

Fals

e Po

sitiv

e R

ate

Figure 5-26 Comparison of False Positive Rates among Different Objective Functions

111

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1False Positive Rate

True

Pos

itive

Rat

e

Raw Images Least SquaresWeighted Least Squares Desirability FunctionProbability Density Difference

Figure 5-27 ROC Analysis of the Classification Performance of Different Objective Functions

Table 5-15 Comparison of AUC among Different Objective Functions

Objection Function (Area Under the

ROC curve) AUC

No Objective Function (Torabi Bayesian Model of Raw Images)

0.974

Least Squares 0.998 Weighted Least Squares 0.991 Desirability Function 0.996 Probability Density Difference 1

Table 5-16 Image Quality Metrics for the Training Image Set After Blanket Processing

Brightness Contrast Blur Noise Illumination uniformity

Local contrast

Average 0.42 0.075 0.47 0.69 0.89 0.49 95% Confidence

Interval on Average ±0.0004 ±0.004 ±0.01 ±0.01 ±0.014 ±0.02

Minimum 0.39 0.019 0 0.38 0.18 0.03 Maximum 0.43 0.43 1 0.94 1 1

112

Table 5-16 shows the average, confidence interval, minimum and maximum for each

individual five image quality metric. It should be noted that the values of these IQ

Metrics are between 0 and unity. In general, the training set images have a rather low

contrast with an average of 0.075. Most images are blurry with an average blurriness of

0.47 and range from 0 to 1, meaning from the worst to the best. The training images’

noise level is moderate with an average of 0.69 but it varies from 0.38 to 0.94. The

training images have high illumination uniformity but again the range of it is from 0.18 to

1. The average local contrast of potential particles is moderate with an average of 0.49.

The table shows that the training image set contains a variety of image. During the off-

line image modification, the selection of image operators are based on the image quality

metrics of an individual image not the average values list in the table.

In the next section the data obtained from off-line image modification is used to provide

the basis for a method of in-line image modification to improve classification of

“unknown” images. Images from new extrusion monitoring experiments are added to

images previously provided by Torabi and a method for adapting the in-line image

processing to large change in the quality of a raw image is developed.

The average image quality optimization time per image was 176 seconds, using a

Pentium 4 Windows NT station.

113

5.2 In-line Image Quality Modification This section of the thesis shows how the second objective was accomplished. That is, it

shows the development of an in-line method for modifying image quality to improve

classification. As outlined in Section 2.4.1, the strategy is to utilize the results of the first

objective to formulate the needed “Reference Image Database”. Subsequent steps in the

strategy involve using this database to provide the means for correctly modifying new

“unknown” images and finally adding adaptability to the system with experimental

verification to demonstrate its utility. In the next section the Reference Image Database

is described. It is the link between the off-line image modification classification results

reported in Section 5.1 and the new classification results to be reported in this section

obtained by in-line classifying “unknown” images. Following that, this section shows

how this database is used with a static (not adaptive) classification models to obtain the

in-line classification results. Finally, in Section 5.2.3 the results of making the model

adaptive are shown.

5.2.1 The Reference Image Database The whole purpose of the first objective was to determine how best to modify a variety of

images so that their classification as WO or WP would be improved. “Known” images

were used (i.e. as the software was informed of the actual class, WO or WP). Another

way of stating this is to say that accomplishing Objective 1 provides the Reference Image

Database. In-line image quality improvement will be done by (a) locating the image in

this database that most closely resembles a newly received “unknown” image and (b)

114

using the IQ Operator details associated with the similar image to improve the quality of

the newly received image.

A small portion (the first ten records) in the Reference Image Database is shown in Table

5-17. The number of IQ operators specified varied from two to four depending upon the

quality of the image involved.

Table 5-17 A Portion of Reference Image Database

Column Number 1 2 3 4 5 6 7 8 9 10 11 12

id Bright-ness

Con- trast

Blurri- ness Noise IU

MeanDensity

PercentArea

IQ operator parameter

IQ Operator parameter

1 0.62 0.04 0.40 0.53 0.25 155.90 6.24E-05 MD radius=126.0 UNSHP weight=0.32 0.41 0.04 0.54 0.64 0.34 139.68 8.21E-05 B/C brightness=144.0 BF radius=44.03 0.42 0.05 0.65 0.68 0.31 159.61 8.05E-05 B/C brightness=121.0 BF radius=46.04 0.41 0.04 0.51 0.66 0.26 113.92 3.94E-05 B/C brightness=159.0 BF radius=40.05 0.41 0.04 0.66 0.65 0.29 158.33 0.000167 B/C brightness=124.0 BF radius=46.06 0.41 0.03 0.63 0.38 0.68 168.29 6.57E-05 B/C brightness=146.0 MD radius=4.07 0.42 0.04 0.60 0.64 0.18 143.21 4.60E-05 B/C brightness=142.0 BF radius=44.08 0.41 0.03 0.52 0.63 0.29 150.10 0.000122 B/C brightness=133.0 BF radius=45.09 0.41 0.17 0.06 0.90 0.91 83.87 0.000256 B/C brightness=106.0 SHP radius=2.0

10 0.41 0.07 0.58 0.70 0.36 148.05 0.000297 B/C brightness=136.0 BF radius=45.0… … … … … … … … … … … … Legends: IU- Illumination Uniformity B/C – Brightness Contrast Operator BF – Background Flattening SHP – Sharpening MD – Mean Filter UNSHP – Unsharp Mask As shown in Table 5-17, the data consists of two main parts: data that enables the most

similar image to a newly received image to be identified (“Similarity Attributes” in

columns 1 through 8 inclusive) and data that describes how the “most similar image” was

converted from a raw image to an image of better quality for classification (“IQ Operator

Instructions”, Columns 9 through 12). The IQ Operator instructions include the identity

of each operator (Columns 9 and 11), the value of operator parameters (Columns 10 and

115

12) and the order of application of the operators (same as the order presented in the table

read from left to right). Considering record 6 in Table 12 for example, the “IQ Operators

instructions” are operator brightness/contrast shift (“B/C” for short) with parameter

brightness set at 146.0 followed by another operator media filter with parameter radius

set at 4.0.

The similarity attributes include the five image quality metrics (mostly statistical image

features): brightness, contrast, blur, noise and illumination uniformity. In addition, the

similarity attributes include the two attributes that are actually used in the Torabi

classification model in this work: percentage area and mean density of potential

particles. As shown by Torabi [2], as well as by preliminary testing of his model for this

work, the other attributes were ineffective because they were highly inter-correlated.

The database is organized in a flat structure; that is, one case following by another case

without ordering. However, it does have two blocks: one block of cases is for images

with high local contrast and the other for images with low local contrast. This aspect will

be described more fully in Section 5.2.3.5. Within each block, the records are placed

randomly.

The Reference Image Database can grow to accommodate new image cases when these

new image cases are initially misclassified in in-line image modification. The

misclassified images are subjected to customized, off-line image modification and the

results were added to the Reference Image Database. For the static classification model

predictions (reported in the next section) the database contained 745 records.

116

5.2.2 In-line Image Quality Modification for Classification: Use of a Static Classification Model

A classification model is termed “static” when it is not updated with new cases. This

section will report on the results of using static pre-optimization and static post-

optimization classification models.

Earlier, Figure 5-1 described how a training image was processed to provide the

information necessary to compose the Reference Image Database and create pre-

optimization and post-optimization classification models. Figure 5-28 shows how the

Reference Image Database along with the pre-optimization and post-optimization

classification models is used to modify and classify “unknown” test images (a test image

is not used to compose the Reference Image Database and its information is not used to

create pre-optimization and post-optimization models in cross-validation. Since the test

image is an “unknown” image it means that the software is unaware of the actual class of

the image but we are.) in-line based on the strategy presented in Section 2.4. This

method is termed the “Image Quality Modification for Classification” model (termed

IQMod Classification”). As shown in Figure 5-28, we see that the test raw image first is

processed with two baseline operations: brightness adjustment and background flattening.

The processed image is then measured for the seven similarity attributes mentioned in

Table 5-18. Next, case-based reasoning is used to find the most similar case in the

Reference Image Database. The most similar case is the one in the Reference Image

Database with the shortest Euclidean distance to the new image. The Euclidean distance

117

is calculated from Equation 5-6. Specifying the actual similarity attributes used this

equation becomes:

∑=

−=7

1

2, )(

iiDBi SSD 5-6

where Si is the value of the ith similarity attribute for the new image and SDB,i is the

corresponding value for an image in the Reference Image Database. The similarity

attribute corresponding to each value of i is shown in Table 5-18.

Table 5-18 Similarity Attributes

i Similarity Attribute 1 Brightness 2 Contrast 3 Blur 4 Noise 5 Illumination Uniformity 6 Mean Density 7 Percentage Area

The image operators and their parameters associated with the image in the Reference

Image Database giving the lowest value of D from Equation 5-6 is then applied to

process the new image. The processed new image is then classified using the post-

optimization model classification model developed using training image set in off-line

image quality modification (as shown in Figure 5-1) . Static classification models

(including both the pre-optimization model and the post-optimization model) were earlier

created in the off-line image modification (see the created models in Section 5.1.2.4)

using the set of training images (in Table 5-2) being repeatedly used for the four image

quality definition throughout the off-line image modification. The set of training images

are from 8 experimental extrusion runs carried out in previous research by Torabi.

118

Using these models described in the Section 5.1 (as shown in Table 5-13), a set of 2888

“unknown” test images (a subset of Image Set 1 in Table 3-1 from Torabi’s research) is

used to test the performance of IQMod Classification. The classification results are

provided in Table 5-19.

119

Figure 5-28 In-line Image Quality Modification Framework

120

Table 5-19 Confusion Matrix for a Subset of Image Set 1 Using IQMod Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 2112 28 Actual Class is WO 13 735

Of the 2888 test images (a subset images in Image Set 1 in Table 3-1), 748 were WO

images and 2140 were WP images. From Table 5-19, it can calculate that an overall error

rate of 1.5% ( or overall classification accuracy of 98.5%) was achieved along with very

low false negative rate of 1.3% (or true positive rate of 98.7%) and false positive rate of

1.7% (true negative rate of 98.3%). These results are shown in Figure 5-29 . The fact that

the number of training images used to create the two static models was only 745 while

the number of test images was almost four times the number of training images suggested

that these results should be very encouraging.

In order to assess the effectiveness of IQMod Classification model, the same set of test

images was classified using Adaptive Bayesian classification model developed by Torabi.

Recall that this is the same as the original Bayesian classification model of Torabi except

that the software permits the model to learn to accept a change in image quality. That is,

when image quality changes and model predictions worsen, the human observer tells the

classification model the correct class (WO or WP) for many new images until it is able to

again predict satisfactorily.

The classification confusion matrix using Bayesian model by Torabi is shown in Table

5-20. To compare the Bayesian model by Torabi and IQMod Classification model, the

two confusion matrices in Table 5-19 and Table 5-20 were used to provide the overall

classification error rates, false negative rates and false positive rates shown in Figure

121

5-29: the results from the Torabi’s Bayesian classification model are significantly inferior

to those from the IQMod Classification model developed in this work. The overall error

rate with the Torabi’s approach was 4.3% versus 1.5% IQMod Classification. The

difference of 2.8% may not look like much. However, it meant a reduction in the number

of reported errors by 65%. Achieving improved error rate in this region (between 0% and

5%) is notoriously difficult and at the same time often very worthwhile from an economic

and practical viewpoint.

Table 5-20 Confusion Matrix for a Subset of Image Set 1 Using Bayesian Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 2038 102 Actual Class is WO 22 726

3%

4.30%4.90%

1.70%1.30%1.50%

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

Overall Error Rate False Negative Rate False Positive Rate

Rat

e

Bayesian Classification IQMod Classification

Figure 5-29 Comparison of Classification Error Rates between Bayesian Classification and IQMod Classification

The above analysis shows that in-line image quality improvement using the static

classification model (i.e. without attempting to update any part of the system) is able to

122

decrease classification error rate very effectively. To see if further improvement could be

obtained and to show how the developed method could be adapted to a different image

quality, an extensive body of other images was obtained and In-line Adaptive Image

Quality Modification with Adaptive Classification model (Adaptive IQMod

Classification) was developed. The development and extensive testing of this

comprehensive model is described in the next section.

5.2.3 In-line Adaptive Image Quality Modification with Adaptive Classification

At this point the work is concerned with the following four models:

In-line Bayesian Classification is the conventional Naïve Bayesian model originally

applied by Torabi to classify images as WO or WP. It accepts as input the percentage

area and the average particle image density of an image.

In-line Adaptive Bayesian Classification is the same as the Bayesian classification

model except that, as mentioned in the previous section, the software permits the

model to learn to accept a change in image quality. That is, when image quality

changes and model predictions worsen, the human observer teaches the software

using many new images to enable it once again to predict satisfactorily.

In-line Image Quality Modification for Bayesian Classification (“IQMod

Classification”) is the use of the Reference Image Database and Case Based

Reasoning along with the pre-optimization Bayesian classification model to

determine how best to modify a new image to improve classification. It also includes

the use of a post-optimization Bayesian classification model to classify new

123

“Unknown” images as WO or WP. Figure 5-28 summarizes this approach and it was

explained in detail in Section 5.2.2. The assessment of the previous section showed

that this approach provided superior results to the In-line Adaptive Bayesian

classification model for the same test images used by Torabi.

In-line Adaptive Image Quality Modification for Adaptive Bayesian

Classification (termed “Adaptive IQMod Classification”) the fourth model is the

subject of this section of the thesis.

This model adapts IQMod Classification to changes in image quality and combines it

with the Adaptive Bayesian model. This required adapting three parts of the system: the

Reference Image Database and each of the two Bayesian classification models (the pre-

optimization and post-optimization classification models). The actual adaptation was

done by using misclassified images as they were obtained. That is, if an image was

misclassified, then this image was subject to the process of off-line image quality

improvement as illustrated in Figure 5-1. The off-line image quality improvement added

relevant information related to the image including IQ metrics of the image and IQ

operator instructions for the image to the Reference Image Database after image quality

optimization. At the same time, this image was used to update the pre-optimization model

and its optimized image was used to update the post-optimization model. Accordingly,

the adaptability of this model resulted from two actions: a) incrementing the Reference

Image Database with IQ operator instructions acquired from customized optimization of

the misclassified image; b) adapting the pre-optimization and post-optimization

124

classification models with information (extracted attributes) from the image and its

optimized image, respectively.

The adaptive IQMod Classification is failure-driven, i.e., the model adapts with the

information of misclassified images. Correctly classified images do not participate in

updating the Reference Image Database and the pre-optimization and post-optimization

models. This model has an advantage that it will not overly expand the reference

database, which results in better efficiency in retrieving cases from the database.

The impact of using this method was assessed in four test trials:

5.2.3.1 Test Trial 1: the Use of a New Set of Images Produced by Torabi The images used in this test trial are a subset of image of Image Set 1 in Table 3-1 from

experimental runs by Torabi which were not used for any purpose previously in this

research as “unknown” images. The classification results of these images using Torabi’s

In-line Adaptive Bayesian Classification, IQMod Classification and Adaptive IQMod

Classification are shown in Table 5-21, Table 5-22 and Table 5-23, respectively.

Table 5-21 Confusion Matrix of Test Trial 1 Image Set Using In-line Adaptive Bayesian

Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 201 35 Actual Class is WO 6 419

Table 5-22 Confusion Matrix of Test Trial 1 Image Set Using Static IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 227 9 Actual Class is WO 2 423

125

Table 5-23 Confusion Matrix of Test Trial 1 Image Set Using Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 235 1 Actual Class is WO 3 422

As shown in Figure 5-30, the overall classification error rate of Adaptive IQMod

Classification was 0.6%. This is 8.8% and 1.1% less than that of the Adaptive Bayesian

Classification and IQMod Classification.

The analysis above demonstrates that the classification error rate was significantly

reduced for Adaptive IQMod Classification over the other two models. It shows that the

combination of a “failure-driven”, incremental, case-based reasoning method and

Bayesian adaptive classification models is very powerful in terms of dealing with image

quality variability.

126

1.4%

14.8%

9.4%

0.5%

3.8%

1.7%0.4% 0.7%

0.6%0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

16.0%

18.0%

Overall Error Rate False Negative Rate False Positive Rate

Per

cent

age

In-line Adaptive Classification IQMod ClassificationAdaptive IQMod Classification

Figure 5-30 Comparison of Classification Error Rates for Test Trial 1 among Different Models

5.2.3.2 Test Trial 2: the Use of Microgel Image Set Produced by Ing This test involved 1344 images (Image Set 2 in Table 3-1) produced by Lianne Ing in her

research and not yet applied in any previous training and testing. This set of images is of

microgel with relatively low local contrast and illumination uniformity (Table 5-24)

compared to those of the training images (Table 5-14). It was also evident that this set of

images was not as blurry as those in the training images and they were much more

uniform in terms of blurriness and noise level with variation coefficients of 0.1% and

0.7%. The contrast of this image set was 0.11. This was marginally higher that of the

training images (0.07). The average values of the IQ Metrics and local contrast of the

images are listed in Table 5-24.

127

Table 5-24 Image Quality Metrics of Test Trial 2 Microgel Image Set

Brightness Contrast Blur Noise Illumination

uniformity Local

contrastAverage 0.55 0.11 0.70 0.64 0.51 0.21

95% Confidence Interval

On Average

±0.0006 ±0.0002 ±0.0001 ±0.0002 ±0.01 ±0.006

Minimum 0.53 0.10 0.69 0.65 0.15 0.145Maximum 0.57 0.20 0.70 0.72 1 1

The classification results from applying In-line Adaptive Bayesian Classification, IQMod

Classification and Adaptive IQMod Classification to these images are tabulated in Table

5-25, Table 5-26 and Table 5-27, respectively.

Table 5-25 Confusion Matrix of Test Trial 2 Image Set Using In-line Adaptive Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 78 10 Actual Class is WO 0 1256

Table 5-26 Confusion Matrix of Test Trial 2 Image Set Using Static IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 85 3 Actual Class is WO 0 1256

Table 5-27 Confusion Matrix of Test Trial 2 Image Set Using Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 87 1 Actual Class is WO 1 1255

The results in Figure 5-31 show a significantly improved overall classification error rate

of 0.15% using the Adaptive IQMod Classification compared to the value of 0.7%

achieved with In-line Adaptive Bayesian Classification. In addition, this model shows

128

true positive rate and true negative rates of 98.9 and 99.9%, respectively. The

performance again shows the strength of combining adaptive image quality modification

with adaptive Bayesian classification. Some dependence on variability of the images was

evident for such results. For example, as shown in Figure 5-31, for microgel images,

there is no significant difference in terms of classification error rate between IQMod

Classification and Adaptive IQMod Classification.

0.0%0.0%

11.8%

0.7%

3.5%

0.2% 0.15%1.1%

0.1%0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

Overall Error Rate False Negative Rate False Positive Rate

Erro

r Rat

e

In-line Adaptive Bayesian ClassificationIQMod ClassificationAdaptive IQMod Classification

Figure 5-31 Comparison of Classification Error Rates for Test Trial 2 among Different Models

The tests above target images produced by previous researchers (Torabi and Ing). In all

cases the Adaptive IQMod Classification model developed here provided superior results

to previous models. The next two trials involved new extrusion runs to purposefully

create images with extremely wide diversity in appearance (see Chapter 3 for

experimental details)

129

5.2.3.3 Test Trial 3: the Use of Images from New Extruder Runs Utilizing Injection of Particles with Low Additive Polyethylene Pelletized Feed

In general, the images generated from 530A polyethylene pellets (a subset of 1472

images of Image Set 3 as from run 3-1 to run 3-4 in Table 3-2) had consistently lower

values of average image quality metrics (Table 5-28) than those of the training images

(Table 5-14).

Table 5-28 Image Quality Metrics of Test Trial 3 Image Set

Brightness Contrast Blur Noise Illumination

uniformity Local

contrast Average 0.40 0.034 0.37 0.57 0.71 0.24

95% Confidence Interval

On Average

±0.0003 ±0.001 ±0.039 ±0.007 ±0.025 ±0.007

Minimum 0.39 0.019 0 0.47 0.27 0.094 Maximum 0.41 0.067 1 0.77 1 0.48

The classification confusion matrices for this set of images are tabulated in Table 5-29 ~

33 and the classification error rates are illustrated in Figure 5-32. From these tables and

figure, we see that the highest overall classification error rate is 1.5% for pure 530A

polyethylene pellets. Even when no particles were added, particles were present in

extrusion of pure 530A because of residuals remaining from previous extrusion runs

where particles were intentionally added. For images when 200ppm 30µm GMS is added,

the classification accuracy reaches 100%, meaning an error rate of zero. For images

captured with the addition of 40ppm 100µm-GMS to the 530A polymer, the overall error

rate is 0.1% (or accuracy of 99.9%) with a false positive rate of 0%. The classification

error rate for images with the addition of 20ppm glass bubbles to feed is 1.2% with a

“balanced” false negative rate of 0.8% and false positive rate of 2.0% (or true negative

130

rate of 98.0%). These excellent values demonstrated that the new method adapted very

effectively to this wide diversity of images. The classification error rates using the Torabi

Adaptive Classification method and the static IQMod Classification method are 46.6%

and 8.6%, respectively [see Table 8-13 and Table 8-15, Appendix VII]. It is thus evident

that Adaptive IQMod Classification significantly also outperformed these two

classification methods.

Table 5-29 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 in Table 3-2) Using Adaptive

IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 119 2 Actual Class is WO 2 146

Note: 530A – a graded polyethylene pellets produced by Dow Chemical

Table 5-30 Confusion Matrix for Test Trial 3 Image Subset (Run 3-2 in Table 3-2) Using Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 401 0 Actual Class is WO 0 2

Table 5-31 Confusion Matrix for Test Trial 3 Image Subset (Run 3-3 in Table 3-2) Using Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 336 4 Actual Class is WO 0 60

Table 5-32 Confusion Matrix for Trial 3 Image Subset (Run 3-4 in Table 3-2) Using Adaptive IQMod

Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 245 2 Actual Class is WO 3 150

131

1.2%1.0%

1.5%

0.0% 0.0%

1.2%

1.6% 0.8%2.0%

0.0% 0.0%

1.3%

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

530A 530A + 30µmGMS – 200ppm

530A + 100µmGMS – 40ppm

530A + Glass bubbles -20ppm

Erro

r Rat

e

Overall Error Rate False Negative Rate False Positive Rate

Figure 5-32 Classification Error Rates for Test Trial 3 Using Adaptive IQMod Classification

5.2.3.4 Test Trial 4: the Use of Images from New Extrusion Runs Utilizing Injection of Particles with High Additive Polyethylene Pelletized Feed

These images (a subset of 538 images of Image Set 3 in Table 3-1 and the same set of

images as in Run 3-5 and 3-6 in Table 3-2, and hereinafter called Batch-A) initially

proved the most difficult of all. The classification results are shown in Table 5-33, Table

5-34 and displayed in Figure 5-33 . The result shows that the classification error rates for

Batch-A without the addition of particles and Batch-A with the addition of 75µm glass

beads are 17.1% and 29.7%. False positive rate for images from Batch-A with the

addition of 75µm glass beads was also disappointing at 56.6% though a reasonably false

negative rate of 8.2% (or a true positive rate of 91.8%) was obtained.

This result above exposed a weakness that the system has not previously encountered. To

define this weakness, an analysis of image quality was carried out. Table 5-35 shows the

132

measured image quality metrics. For brightness, global contrast, blurriness and noise

metrics, no significant differences were evident compared to those of the training image

set. Blur and noise levels were even better than those of the training set images. However

Batch-A images had a very low local contrast of 0.17 and an illumination uniformity of

only 0.35, These values were much lower than the values observed in the training image

sets (which exhibited values of 0.49 and 0.89 respectively). Therefore, it was

hypothesized that these two low values were the cause for the high misclassification rate.

The classification error rates for the Batch-A image set with injection of 75µm glass

beads using the Torabi Adaptive Classification method and the static IQMod

Classification method are 36.7% and 30.6%, respectively [see Table 8-17 and Table 8-19

in Appendix VII]. As in the previous section, it is thus evident that the 29.7%

classification error rate of Adaptive IQMod Classification again also outperformed the

Torabi Adaptive and the Static IQMod Classification methods.

Table 5-33 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-5 in Table 3-2) Using

Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 6 1 Actual Class is WO 32 156

Note: Batch-A – a graded polyethylene pellets produced by Exxon

Table 5-34 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 175 16 Actual Class is WO 86 66

133

17.2%17.1% 14.3%

56.6%

29.7%

8.2%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Overall Error Rate False Negative Rate False Positive Rate

Ove

rall

Erro

r R

ate

Batch-A Batch-A + 75µm Glass Bead

Figure 5-33 Classification Results for Test Trial 4 Image Set Using Adaptive IQMod Classification

Table 5-35 Image Quality Metrics for Test Trial 4 Image Set

Brightness Contrast Blur Noise Illumination uniformity

Local contrast

Average 0.42 0.064 0.49 0.72 0.35 0.17 95% Confidence

Interval on Average ±0.0003 ±0.002 ±0.04 ±0.005 ±0.02 ±0.005

Minimum 0.41 0.042 0.0000 0.62 0.098 0.086 Maximum 0.42 0.13 1.0000 0.82 1.0000 0.32

Furthermore, since local contrast and illumination uniformity are independent, it is

reasonable to examine the consequence of these two quality metrics on classification

separately. Illumination uniformity, unlike blurriness, noise and contrast, is a global

metric which takes little account of the presence of particle images (see Appendix I for

details of its computation). Therefore its effect on classification would likely be limited.

Additional evidence for this conclusion was the fact that illumination uniformity ranged

from 0.15 to 1 in previous image sets in Table 5-24 and Table 5-28 but the classification

accuracy for those set of images is very high. Because of this, the examination of the

134

failure was centered on local contrast. Local contrast measures the contrast of potential

particles against their immediate background.

5.2.3.5 The Application of Decision Rule in Case-based Reasoning (CBR) Examining the misclassified cases in the Batch-A image (in Table 5-34) showed that the

case-based reasoning retrieved most similar image cases in the Reference Image

Database having a much higher local contrast (defined below in Equation 5-7) than those

of their corresponding misclassified Batch-A images. Therefore a large number of

misclassified images with low local contrast were processed with IQ operator instructions

suited to higher local contrast images. This led to the misclassification. It also meant that

the Euclidean distance function used to retrieve the image case similar to the current

image failed in this situation.

Local contrast is defined by:

B

PB

GGG

ContrastLocal−

= 5-7

where GP is the mean density (mean grey level) of all particles present, and GB is the

mean grey level of the immediate background of the particles.

Recall that local contrast is not one of the similarity attributes (in Table 5-18 ) used in

Euclidean distance function for similarity measurement between two images. Instead

global contrast (defined below in Equation 5-8) is used. Global contrast is defined by:

MinMax

MinMax

LLLL

ContrastGlobal+−

= 5-8

135

where Lmax and Lmin is the maximum grey level and minimum grey level of an entire

image (both particles and background considered together).

It was hypothesized that the existing Euclidean distance measure would still be suitable if

it could be applied to images with similar local contrast values. Splitting the database

into low and high local contrast image blocks is the simplest approach and could work

depending on the nature of the variability of local contrast in the images and the tolerance

for variability in local contrast of the case-based reasoning approach using the Euclidean

distance measure. The approach used was to specify a threshold local contrast value and

divide the Reference Image Database into two parts above and below that value. Figure

5-34 shows a flow chart illustrating the method.

Figure 5-34 The Application of Decision Rule in Case-Based Reasoning

136

For each image, after the usual blanket operations, the local contrast and other IQ Metrics

were measured. If the local contrast of an image was higher than the threshold value,

Case Based Reasoning would find the most similar image case in the high local contrast

block in the Reference Image Database; otherwise, the case retrieval was executed in the

low local contrast block. This process only affected which block of images was selected

to be the Reference Image Database for a particular image. The remainder of the method

was the same as specified in Figure 5-28.

5.2.3.6 Classification Results after the Application of Decision Rule in Case-Based Reasoning

To implement the method of dividing the Reference Image Database into two parts it was

necessary to determine the best threshold value of local contrast. This was done by trial

and error.

The effect of the threshold of local contrast on classification error rate for Batch-A

images (shown in Table 5-34) is listed in Table 5-36. For each threshold dividing the

reference database, classification on the Batch-A images was performed and the

classification accuracy was recorded. The result shows a general trend: as the threshold of

local contrast increased, the true positive rate decreased while false positive rate

increased. However, the overall classification accuracy increased, reached a peak and

then decreased. The classification error rate reached the lowest value (5.25%) at a

threshold of local contrast of 0.171. At that point the false negative rate and the false

positive rate also reached their lowest values (5.24% and 5.26% respectively).

137

Table 5-36 The Effect of Local Contrast Threshold on Classification Error Rate for Test Trial 4

Local Contrast

Threshold False

Negative Rate

False Positive

Rate

Classification Error Rate

0.15 3.14% 32.89% 16.33% 0.16 4.71% 17.11% 10.20% 0.165 4.71% 13.82% 8.75% 0.17 5.24% 5.92% 5.54% 0.171 5.24% 5.26% 5.25% 0.172 6.28% 5.26% 5.83% 0.175 9.42% 5.26% 7.58% 0.18 12.04% 3.95% 8.45%

These results are shown in Table 5-36, Figure 5-35 and Figure 5-36. The threshold value

of 0.171 was clearly the best one to use.

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

16.0%

18.0%

0.145 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185

Local Contrast Threshold

Ove

rall

Err

or R

ate

Figure 5-35 The Effect of Local Contrast Threshold on Classification Error Rate for Test Trial 4

138

0.170.171

0.150.160.165

0.172

0.175

0.18

86.0%

88.0%

90.0%

92.0%

94.0%

96.0%

98.0%

0.0% 5.0% 10.0% 15.0% 20.0% 25.0% 30.0% 35.0%

False Positive Rate

True

Pos

itive

Rat

e

Figure 5-36 The Effect of Local Contrast Threshold on Classification Accuracy for Test Trial 4

The classification results for the application of a decision rule to CBR and the division of

the reference database on classification are tabulated in Table 5-37. For purposes of

comparison, the classification result based on the In-line Adaptive Bayesian

Classification model is also listed in Table 5-38. The tabulated results are graphically

presented in Figure 5-37. Figure 5-37 shows that use of the threshold value to divide the

Reference Image Database provided a remarkable improvement in classification error

rate from 29.8% prior to the application of the decision rule to 5.3% after the application.

It also shows that case-based reasoning with decision rules outperformed the In-line

Adaptive Bayesian Classification model developed by Torabi (36.7% classification error

rate). In addition, Case Based Reasoning with the application of the decision rule

possesses a very balanced false negative rate of 5.2% and false positive rate of 5.3% (or

94.7% true negative rate) compared to the In-line Adaptive Bayesian Classification false

139

negative rate of 61.3% and false positive rate of 5.9%. There is a 51.3% improvement in

the true negative rate after the application of decision rule to case-based reasoning.

Table 5-37 Confusion Matrix of Test Trial 4 Using Adaptive IQMod Classification with Decision Rule

Predicted Class is WP Predicted Class is WO

Actual Class is WP 181 10 Actual Class is WO 8 144

Table 5-38 Confusion Matrix of Test Trial 4 Using In-line Adaptive Bayesian Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 74 117 Actual Class is WO 9 143

61.3%

36.7%

5.9%8.2%

56.6%

29.7%

5.3%5.2%5.3%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

Overall Error Rate False Negative Rate False Positive Rate

Erro

r R

ate

In-line Adaptive Bayesian ClassificationAdaptive IQMod ClassificationAdaptive IQMod Classification with Decision-Rule

Figure 5-37 Comparison for Classification Error Rates for Test Trial 4 among Different Models

In addition to the improvement in classification accuracy, another advantage of the

application of the decision rule to divide the database for Case Based Reasoning is that it

greatly improves computational efficiency. The Euclidean distance only needs to be

140

computed for all the images in one part of the Reference Image Database rather than all

of the images for the entire database. However, it must also be noted that the

classification accuracy for images with low local contrast is not as high as was achieved

for images with high local contrast. Possible improvements not examined here include

the introduction of weighting factors into the Euclidean distance calculation, the addition

of new attributes to the Euclidean distance equation and additional rules dividing the

Reference Image Database into more parts.

5.2.4 Summary of the Method Developed for the Second Objective This section detailed the development of an in-line method for modifying image quality

to improve classification. The final model obtained, termed Adaptive IQMod

Classification, combined adaptive image quality improvement with adaptive Bayesian

Classification. Adapting to changes in image quality was accomplished by a combination

of defining image quality utilizing a second Bayesian classification model (termed the

pre-optimization model), the use of case-based reasoning and the employment of a single

decision rule to divide the Reference Image Database to allow for two different

“families” of images based upon their local contrast values. Testing was done with

images used in Torabi’s study as well as with images from new extruder runs where

image quality was purposefully varied by doping the feed with various particles and

using different batches of polyethylene. The new model was a great improvement over

previous work and conclusively demonstrated the advantage of being able to adapt both

image quality and the classification model itself to improve classification performance in

dynamic environments. Also, the method underlying development of this model is

141

intrinsically very flexible and is expected to find use in a wide variety of applications

beyond the specific topic of in-line particle monitoring examined here.

In response to a suggestion by Professor A. An (Department of Computer Science and

Engineering, York University) the possibility of employing Case Based Reasoning for

classification of the images into WO and WP classes and thus avoiding the use of the

Bayesian Classifier, was investigated. A summary of this investigation is presented in

Appendix VII and shows that, although Case-Based Reasoning Classification is generally

an improvement over the Torabi Adaptive Classification model, the Adaptive IQMod

method developed in this research provides superior results for the datasets examined.

As shown in Appendix VIII, Adaptive IQMod Classification not unexpectedly requires

more computation time (about 4 seconds in total) than the other simplest classification

methods (which are about 2 seconds). However, this is not a significant limitation for

conventional plastics extrusion (or for many other uses of the method). Also, as

mentioned in Appendix VIII, if computation time is an issue for some applications there

is room for improvement in speed by focusing on the intermediate image storage.

142

6 CONCLUSIONS Adaptive in-line image quality improvement was successfully combined with adaptive

Bayesian classification to provide a very general and powerful approach for automated

image classification. The solution consisted of accomplishing two objectives. Each

required development of new software and experimental verification. Data from previous

workers and data from new extrusion runs were used.

The first objective, development of an off-line automated method for improving raw

image quality to improve classification, was accomplished by utilizing the classification

method itself to provide a measure of image quality and then using an optimization

method to obtain image quality modification instructions on how best to improve the

quality of each image. This very novel “task based” definition of image quality provided

the needed link between image quality and classification performance. The outcome of

this part of the work was information needed to form a Reference Image Database. For

each reference image this database provided (a) similarity attributes that described the

raw image and (b) instructions for obtaining the optimized image from the raw image.

The second objective was development of an in-line automated method for achieving

classification by combining adaptive image quality modification with adaptive Bayesian

classification. From the similarity attributes in the Reference Image Database, Case

Based Reasoning was successfully used to identify the reference image most closely

resembling a new “unknown” image. The accompanying instructions were then used to

143

improve the image quality and the resulting image was classified with the Bayesian

model.

Image quality improvement was made adaptive by adding new images to the Reference

Image Database. The Bayesian classification model was made adaptive in the same way

that was done by Torabi. A minor modification in how Torabi accomplished

thresholding an image was implemented. It involved restricting attention to the most

relevant part of the images grey level histogram (a plot of number of pixels versus their

grey level). It was initially done to decrease computational time but later was found to

surprisingly significantly decrease error rates as well.

Experimental verification of the combined model (termed the Adaptive IQMod

Classification) revealed it to be far superior to any previous model. Testing involved four

test trials using different data sets. For the most difficult dataset it was found

advantageous to subdivide the Reference Image Database into two parts: low local

contrast images and high local contrast images. This demonstrated the extremely high

flexibility inherent in the approach developed.

144

7 RECOMENDATIONS

• The Intelligent Image Interpretation System (IIIS) developed here should be

established as a website on the Internet that allows it use world-wide.

• The IIIS should be applied to diverse images obtained from sources other than

particle monitoring. Images in the medical field appear particularly suitable.

• The use of other classification methods beyond the Bayesian method should be

investigated.

• The use of additional image similarity attributes should be explored for the more

difficult-to-classify images to compare results to those obtained by adding a rule

segmenting the Reference Image Database.

• Other similarity measures beyond the Euclidean distance used here should be

explored. As a first step, weighting factors could be used in the Euclidean

distance equation.

145

8 APPENDICES

Appendix I An Overview of Objective Image Quality Metrics (IQ Metrics) As mentioned earlier, objective IQ Metrics (IQMs) can be divided into those requiring a

reference image and those not requiring a reference image. In this literature review, only

no-reference objective IQMs will be discussed.

There are five primary no-reference objective IQMs that may be measured physically:

brightness, contrast, blur, noise and illumination uniformity. Other no-reference IQMs

including image distortion metrics and artifact metrics, are not relevant here.

Brightness Metric

Brightness (Br) is a measurement of the deviation of the average grey-value of an image

from the pre-determined desired grey-value and is given by:

desired

desiredavg

I

IIBr

−= 8-1

where Iavg is the average grey-value of the image and Idesired is the desired grey-value for

an image.

Contrast Metrics

The contrast of simple images with a uniform background is well defined and agrees with

perceived contrast. Most definitions of contrast are measures of grey value difference

relative to background. Two metrics, namely, “Michelson contrast” and “King-Smith and

146

Kulikowski contrast”, have been commonly used for quantifying contrast. Michelson

contrast is given by

minmax

minmax

LLLLCM +

−= 8-2

where Lmax and Lmin are the maximum and minimum grey values, respectively.

Michelson contrast is a global contrast. It is sometimes not appropriate because one or

two points of extreme brightness or darkness can determine the contrast of the whole

image. In this work, this definition will be used because of its generality without any

assumptions attached and its effectiveness for describing the global contrast of an image.

The King-smith and Kulikowski contrast is given by

LLCkk

Δ= 8-3

where L is the background grey value, and ΔL is the increment or decrement in the object

grey value from the uniform background. One usually assumes a large background with a

small object, in which case the average grey value will be close to the background grey

value. This definition of global contrast requires a uniform background, which is not the

case for most images in this research. However, this definition as a local contrast metric

was used in this research.

The two metrics above assign a single value to the whole image, but contrast may vary

greatly across the image. This is particularly true for complex images with a non-uniform

background. Peli [113] suggested a contrast metric based on a Fourier transformation. For

each frequency band, the contrast is defined as the ratio of the bandpass-filtered image at

that frequency to the low-pass image filtered to an octave below the same frequency

147

(local grey value mean). This definition gives each pixel a contrast measurement and is

given by

),(),(),(

yxlyxayxc = 8-4

where c(x,y) is the contrast at pixel (x,y), a(x,y) is the grey level of pixel (x,y) on an x,y

co-ordinate scale in a band-pass filtered image, and l(x,y) is the grey level of pixel (x,y)

in a low-pass filtered image. This definition provides a local contrast measure for every

spatial frequency that depends not only on the local grey value at that frequency but also

on the local background grey value as it varies from place to place in the images.

However, this definition is computationally expensive even though it does provide a

reliable way to calculating local contrast.

VeldKamp recommended [39] a local contrast metric given by

∑∈

−=)(

1iNbdj

jii yN

yc 8-5

where Nbd(i) represents a neighborhood or window at i of size N, and y is the grey value

of a pixel.

VeldKamp’s local contrast metric is used to estimate the local contrast standard deviation

for different grey levels. However, this metric is only applicable when noise depends

strongly on the intensity. It proved to be successful for mammogram images [114].

For a real-time imaging system, a local contrast metric is more reliable than a global

contrast metrics because the background of image often constantly changes from place to

place in the image.

148

Noise Metrics

Noise is random fluctuation in the grey value of a pixel as opposed to deterministic

distortions, such as shading or lack of focus. It is a defect that can take many forms and

arise from various sources. In most cases it shows up as a sharp variation in grey value in

a uniform region of the scene. Noise can be in the form of photon noise, thermal noise,

readout noise, amplifier noise and quantization noise. Among these, photon noise is a

Poisson noise which is significant in most cases. Other forms of including readout noise,

amplifier noise and quantization noise, are additive (Gaussian) and can be reduced to

manageable levels through image processing. Noise is of high spatial frequency in the

frequency domain.

In general, a pixel is declared to be noisy if it violates the local smoothness constraint

(around an edge area). The most commonly used metric of noise is signal-to-noise ratio

(SNR). SNR can have several definitions. The noise is characterized by its standard

deviation, σn.

If the image (signal) is known to lie between two boundaries, Lmax (max grey value of an

image) and Lmin (minimum grey value of an image), then the SNR in decibles (dB) is

defined as:

dBLLSNRn

)(log20 minmax10 σ

−= 8-6

If the grey value is not bounded but has a statistical distribution then two other definitions

are used. If signal and noise are interdependent, then the SNR is given by

149

dBmSNRn

a )(10log20σ

= 8-7

where ma is sample mean of the pixel grey levels in a region of interest in an image ;

Otherwise when signal and noise are independent, we have

dBSNRn

a )(10log20σσ

= 8-8

where σa is the standard deviation of the grey levels of pixels in a region of interest in an

image.

As can be seen, SNR calculation for the entire image based on above equations is not

directly available because standard deviation of noise σn is, in general, unknown. The

normal estimate of the standard deviation of noise based on standard deviation of grey

levels in an image tends to be larger than the true σn because the variations in the image

grey value are not generally due to noise but to variation in local information. There is no

simple way to estimate σn. However, within a region of an image, there is a way to

estimate the SNR. We can use local σa as σn and the dynamic range (Lmax-Lmin) for the

image to calculate a global SNR. The underlying assumptions are that 1) the signal is

approximately constant in that region and the variation in the region is therefore due to

noise, and, 2) that the noise is the same over the entire image with a standard deviation

given by σa=σn.

A fuzzy noise metric was proposed by Zhang [115] for the measurement of impulse

noise, and it is given by

150

⎪⎩

⎪⎨

≥−

≤−≤−

−−≤−

=

byxmyxobyxmyxoa

abayxmyxo

ayxmyxoyxf

|),(),(|1|),(),(|||),(),(|

|),(),(|0),( 8-9

where o(x,y) is the grey level of pixel (x,y), a and b are the two pre-determined

parameters, m(x,y) is the median value of the grey level of pixel (x,y)’s neighborhood. If

f(x,y) < Td (Td is given threshold value), pixel (x,y) is not an impulse noise.

Mathematically, this metric is simple. However, the drawback of this metric is that its

calculation requires three predefined parameters, which need to be chosen.

Chen [116] proposed Moran I test for measuring the noise of an image. The Moran

coefficient I (Chuang and Huang 1992) for pixels in an r × c window is calculated as

∑ ∑×

× ×

−−= cr

yx

o

cr

nm

cr

yxnmyx

Nfyxf

SfnmffyxfI

),(

2

),( ),()],),(,[(

/]),([

/]),([]),([δ

8-10

where f(x,y) is the grey value of pixel (x,y), f is the mean grey value inside the

window. δ [(x,y),(m,n)]=1 if pixel (x,y) and (m,n) are adjacent, and 0 otherwise, So=

∑∑δ[(x,y),(m,n)] is the number of contiguous pairs in an image and N=r × c is the total

number of pixels. This I value measures the noise of the region under study. The

numerator is a measure of covariance among the pixels and the denominator is a measure

of variance. A higher value of I means more correlation between pixels and less

likelihood that the image is noisy. I = 1 when all pixels have the same grey levels. If the

pixels inside the window are randomly distributed, the random variable I can be

151

approximated by a normal distribution (when N is large enough) with mean m and

variance σ given in the following equations:

11−

−=N

m 8-11

And

[ ] [ ] 22021

2021

22

)3)(2)(1(62)1(3)33(

mNNN

SNSSNNKSNSSNNN−

−−−+−−−+−+−

=σ 8-12

where

[ ]2

),(

2

4

),(

)),((

),(

⎥⎦

⎤⎢⎣

⎡−

−=

∑cxr

yx

cxr

yx

fyxf

fyxfNK

And S1=2S0 and S2=8(8rc-7r-7c+4).

The standardized normal statistic is:

σmIz −

= 8-13

We can use the standardized normal distribution to test the statistic. A higher z will lead

to the rejection of the null hypothesis that the image is noisy. In this work, Moran I test

will be used in considering its effectiveness of measuring noise for images with sparse

objects.

Blur Metrics

Edges are one of the most important features in an image. It is very important to have a

sharp edge when measuring objects of interest in an image. The relative blur of an image

can be measured either in the spatial domain or in the Fourier domain.

152

Caviedes [50] suggested a blur metric based on local kurtosis in the Fourier domain. The

idea is similar to sharpness metric by Zhang [117]. It is the average of all local measures

of sharpness. This approach, however, is computationally complex due to image partition

and Fourier transformation. Li [46] proposed a no-reference blur metric in spatial

domain. However, it is again computationally complex and thus not very practical.

Marziliano [118] introduced a content-independent no-reference perceptual blur metric.

The perceptual blur measurement is defined in the spatial domain as the spread of the

edges. This metric is of low computational complexity. It was successfully applied in

many real image and video applications. In this research, this perceptual blur metric will

be adopted.

The algorithm for measuring the perceptual blur metric first applies an edge detector (e.g.

vertical Sobel filter) in order to find vertical edges in the image. Then each row of the

image is scanned. For pixels corresponding to an edge location, the starting and ending

positions of the edge are defined as the local extremum locations closest to the edge. The

edge width is then given by the difference between the starting and ending positions, and

is identified as the local blur measure for this edge location. Finally, the global blur

measure for the whole image is obtained by averaging the local blur values over all edge

locations.

In this work, the no-reference perceptual blur metric will be used because it is suitable for

measuring the blurriness of particles which are of interest in this research.

153

Illumination Uniformity

The illumination uniformity metric measures the grey value change in an image. One

definition of the uniformity metric [119] is given by

∑=),(

|}),({|1yx

yxdiffmedianN

HM 8-14

where N is the total number of pixels in the image, (x,y) defines pixel position on an x,y

co-ordinate scale, diff(x,y) is the set of absolute horizontal and vertical grey value

differences between any two horizontal pixels or any two vertical pixels in a 3 x 3

window around pixel (x,y), and median is the median operation, that is, obtain the median

value of a set of data. This metric is appropriate for measuring the uniformity of texture

or a pattern in an image.

Another uniformity metric is based on the spatial grey level dependence matrix

(SGLDM), which estimates the second-order joint conditional probability density

function. It is much more suited to calculation of the texture homogeneity of an image

since it measures the dominant neighbouring (local) grey level transitions rather than the

variation of grey levels from region to region in an image.

The two metrics explained above have a major drawback: they cannot accurately

measure illumination non-uniformity spreading across a large part of the image.

Thus, a new metric for quantifying non-uniformity was required. It was decided to base

the metric on the hypothesis that, for uniform illumination, the mean grey level of pixels

in different regions of an image should be equal except for the effect of random

fluctuations in illumination. Furthermore, initially, non-uniformity in the horizontal

direction was to be distinguished from non-uniformity in the vertical direction. The

154

approach used was essentially a two way analysis of variance (ANOVA). The image was

divided into equal regions (actually segments of 50 X 50 pixels as shown in Figure 8-1).

In Figure 8-1, the treatment variable (vertical direction) has 6 levels (T1~T6) and the

block variable (horizontal direction) has 7 levels (B1~B7). . The grey levels of pixels are

considered as replicates at specific treatment and block levels. Within each region the

grey level variability is assumed to follow a Normal distribution with the same standard

deviation in each.

155

Figure 8-1 Quantification of Illumination Uniformity Using ANOVA Analysis The two-way ANOVA table Table 8-1 shows how the usual ANOVA F statistics were

calculated where “a” is the number of horizontal levels, “b” is the number of vertical

levels, n is the sample size (number of pixels) for each segment, N (= abn) is the total

sample size (total number of pixels in an image).

50 pixels

Treatment (i)

Block (j) B1 B2 B3 B4 B5 B6 B7

T1

T2

T3

T4

T5

T6

156

Table 8-1 Two-way ANOVA Table for Quantifying the Illumination Uniformity

Source of Variances

Sum of Square (SS)

Degree of Freedom

(df)

Mean Sum of Square(MSS)

Calculated F Value

Critical F

Value Horizontal Effect (H)

SS(H) a-1 MSS(H) =SS(H)/(a-1)

F(H)= MSS(H)/MSS(E)

Fcritical,(H)

Vertical Effect (V)

SS(V) b-1 MSS(V)= SS(V)/(b-1)

F(V)= MSS(V)/MSS(E)

Fcritical(V)

Interaction Effect

SSI (a-1)(b-1) MSS(I)= SSI/((a-1)(b-1))

FI= MSS(I)/MSS(E)

Fcritical(HV)

Within (error)

SSE ab(n-1) MSS(E)= SSE/(ab(n-1))

Total TSS abn-1 Note: Fcritical(H) is the critical F value with degrees of freedom of a-1, ab(n-1) and

significance level at 5%; Fcritical(V) is the critical F value with degrees of freedom of b-1, ab(n-1) and

significance level at 5%; Fcritical(HV) is the critical F value with degrees of freedom of (a-1)(b-1), ab(n-1)

and significance level at 5%. The calculation of SS(H), SS(V), SSI, TSS and SSE in Table 8-1 is given by the

following equations and in that order:

∑=

−=a

ii xxnbHSS

1

2)()( 8-15

∑=

−=b

jj xxnaVSS

1

2)()( 8-16

∑∑= =

+−−=b

j

a

ijiij xxxxnSSI

1 1

)( 8-17

∑∑∑= = =

−=n

k

b

j

a

ikji xxTSS

1 1 1)( 8-18

SSIVSSHSSTSSSSE −−−= )()( 8-19

where x is the mean grey value of the image, ix is the mean grey value of ith horizontal

row in the segmented image, jx is the mean grey value of the jth vertical column in the

segmented image, ijx is the mean grey value of the segment at the ith row and jth column

157

in the segmented image, and kjix is the grey value of the kth pixel in the segment of the ith

row and jth column.

The above analysis permits uniformity of illumination to be assessed in both a horizontal

and a vertical direction. However, it was found that assessment in only the horizontal

direction was sufficient. An illumination uniformity metric (IU) was defined as follows:

)()(

)()(

1)()(

HFHFIU

HFHFif

IUHFHFif

critical

critical

critical

=

>

=<=

8-20

That is, if the calculated (observed) F value (F(H)) of the horizontal effect is less than or

equal to the critical F value (Fcritical(H)) at a 5% significance level, the illumination

uniformity is unity. This is equivalent to accepting the null hypothesis that the mean grey

values at different locations in the horizontal direction of an image are the same;

otherwise the illumination uniformity metric is calculated as the ratio of critical F value

to the calculated F value. Thus, the metric quantifies how closely the calculated F value is

to the critical F value: the worse is the illumination uniformity of an image, the lower the

measured value of this illumination uniformity metric. It is expected that the usual

ANOVA assumptions (Normal distribution of grey levels and identical standard

deviations in each region) could sometimes be invalid, especially when a large particle is

located in a region for example. However, results are quite robust to such violations

(http://www.statsoft.com/textbook/stanman.html#assumptions ) except if the variance for

158

each region is correlated with the mean value. This was not found to be the case here for

various images examined. However, the ultimate test is whether the use of the IU metric

agrees with observed uniformity, whether it shows improvement when uniformity is

improved and whether it enables image quality to be modified for improved

classification. Table 8-2 shows the values of IU for 745 “typical images” before and after

correction of uniformity. Values agreed with subjective evaluation of image uniformity.

Also, the average value of illumination uniformity for the raw images is 0.017: This was

much lower that the average of 0.89 for the illumination corrected images. Furthermore,

it was demonstrated that each illumination corrected image had a higher illumination

uniformity value than did its raw image. Based upon these results the IU metric was used

throughout the thesis for quantifying image uniformity with the classification results

provided additional justification for its use.

Table 8-2 Comparison of Illumination Uniformity for Raw Images and Their Illumination Corrected Images

Illumination Uniformity 745 Raw Images 745 Illumination Corrected

Images Average Value 0.017 0.89

Minimum Value 0.005 0.18 Maximum Value 0.076 1

159

Appendix II Image Quality Operators This appendix provides an overview of the most important image quality operators.

Radiometric Operators Radiometric operators change each pixel value according to a predefined function, called

pixel value mapping (PVM) [120]. The radiometric operators include contrast and

brightness adjustment, binarization/thresholding, histogram-based adjustment, and

arithmetic-based operations.

Contrast and brightness adjustment Mathematically, contrast and brightness adjustment can be linear and non-linear. Linear

adjustment can be expressed as

minmin,

maxmax,

),(),(),(),(

),(),(

dyxndyxnifdyxndyxnif

byxoGyxn

=<=>

+×= 8-21

where o(x,y) is the original grey value of pixel (x,y),

n(x,y) is the adjusted grey value of pixel (x,y),

G is the gain for contrast control,

b is the bias for overall brightness,

dmax is the upper limit of dynamic range

dmin is the lower limit of dynamic range.

The notation for Equation 8-21 will be applicable for the remainder of this section.

If the gain G in Equation 8-21 is 0, then the pixel value linearly shifts either to dark or

bright depending on the sign of bias b. If the image is too dark, then b is chosen to be

positive, vice versa.

160

When constraints are put on o(x,y) in Equation 8-21, the contrast and brightness

adjustment can be piecewise linear, and this is given by

minmin,

maxmax,

),(),(),(),(

),(),(),(

dyxndyxnifdyxndyxnif

UyxoLbyxoGyxn

=<

=><<

+×=

8-22

where L and U are constants between dmax and dmin.

In Equation 8-22, the choice of L and U has a strong effect on contrast enhancement.

When L and U are in the lower range of dynamic range, the contrast is stretched in the

dark range, that is, the contrast in the low grey level are enhanced which is useful if the

contrast is poor in the dark range. When L and U are in the upper range of dynamic

range, the contrast is stretched in the light range.

Other than linear adjustments, there are non-linear adjustments such as logarithmic

adjustment useful for x-ray images, and exponential adjustment.

Techniques based on Image Histogram

The histogram of an image counts the number of occurrences of each possible value. Sum

of all values in the histogram gives us the total number of pixels in image, and this is

given by (if grey value is assumed continuous):

∫= max

min

)(d

ddDDHA 8-23

where H(D) is the count of pixels with grey value of D, and

A is the total number of pixels in an image;

161

The probability density of a grey value is calculated by normalizing the histogram, i.e.,

ADHDp )()( = 8-24

where p(D) is the probability density of the pixels with grey value of D.

The cumulative density of a grey value is computed by integrating probability density

∫=D

ddDDpD

min

)()(ω 8-25

In addition, a cumulative histogram for an image can be computed as follows:

∫=D

ddaaHDC

min

)()( 8-26

where C(D): the cumulative count of all pixels with grey value up to D (inclusive).

There are two commonly used histogram-based techniques: histogram matching and

histogram equalization.

Histogram Matching

Histogram matching is obtaining an image whose histogram has a specific shape.

Mathematically this is given by

)),(()),((arg yxoCyxnC originalett = 8-27

where Ctarget is the cumulative count of pixels with a new grey value of n(x,y), Coriginal is

the cumulative count of pixels with a grey value of o(x,y).

Histogram Equalization

162

The objective of histogram equalization is to enhance image contrast by flatting the

image histogram and increasing the dynamic range dramatically without affecting the

structural information. Mathematically it is given by

)),((),( yxoCyxn original= 8-28

Histogram equalization may not always produce desirable results, especially for images

with a very narrow histogram and relatively few grey levels. It can produce false edges

and regions. The equalized image may look unnatural, e.g., increased visual graininess. In

some cases, contrast stretching works better since it can work on a selected region of

dynamic range.

Binarization/thresholding

Binarization is an extremely important image processing operation. Its objective is to

separate image background from the foreground. Single threshold binarization is given by

⎩⎨⎧

>≤

=TyxoifTyxoif

yxn),(255),(0

),( 8-29

where T is the threshold value.

Double threshold binarization is given by

⎩⎨⎧

<>≤≤

=21

12

),(),(255),(0

),(TyxoorTyxoif

TyxoTifyxn 8-30

where T1 and T2 are the lower and upper thresholds.

Notwithstanding its apparent simplicity, binarization is a very difficult problem because

of the choice of threshold. A number of conditions can make binarization difficult: poor

163

image contrast, spatial non-uniformity in background intensity, and the ambiguity of

image foreground and background due to multiple levels of different objects.

There is not a single optimally suited method of binarization for all images. Different

approaches for binarization should be taken for images with different characteristics. It is

often best to make the decision by experimentation.

Two categories of binarization techniques are used [121]: global thresholding and locally

adaptive thresholding. Global thresholding techniques use the histogram to identify a

threshold between foreground and background grey values. A single threshold is

determined by treating each pixel value independently of its neighborhood, or without

context. Locally adaptive techniques examine relationship between grey values of

neighboring pixels to adapt the threshold according to the prevailing grey value statistics

for different image regions. Adaptive techniques are applied in an attempt to counter the

effects of nonuniformities in image background.

For images with uniform background, the optimal global techniques ensure better

binarization than the locally adaptive techniques because for global techniques, threshold

selection is based on a larger data set of pixels. Also, for adaptive techniques, a size

parameter matching the size of uniform background regions must be chosen.

ISODATA

A popular global thresholding algorithm is the ISOData based on the histogram. The

ISOData is a simple single value global thresholding method. It is given by

TmmT rightleft −=− 8-31

164

where mleft is the mean value of all grey values with value less than or equal to T in the

histogram and mright is the mean value of all grey values greater than T in the histogram.

Note that both mleft and mright are function of T.

The ISODATA performs well when there are two clearly resolved peaks in the

histogram. Where there are multiple peaks in the histogram, double value thresholding in

general works better than single value thresholding.

There are a set of optimal thresholding techniques based on criterion functions measuring

the separation between regions. For these methods, a criterion function is calculated for

each grey value and that which maximizes/minimizes this function is chosen as the

threshold. These methods include Ostu’s discriminant method, entropy maximization,

moment preservation and minimum error thresholding.

Discriminant method

The discriminant method is a classic global thresholding method. The criterion function

used in this method is

2

2

t

b

δδη = 8-32

where δb2 is the between-classes (between-peaks) variance, and δt

2 is the total variance

with respect to threshold T. The grey value maximizing the criterion function is the

optimal threshold. Between-peaks variance is defined as

22121

2 )( μμωωδ −=b 8-33

where ω1, ω2, µ1, µ2 are the cumulative probabilities and mean values for each class,

respectively.

165

Entropy maximization

In entropy maximization, entropy is used to measure the separation of two classes. This

method entails separating the image data into two classes, above and below a threshold,

and measuring the entropies of each class. This separation is done for each grey value,

and that value for which the sum of entropies of two classes is maximum is the optimal

threshold. The criterion function is given by

∑∑+==

+−=255

10

))(log()())(log()((Ti

T

i

ipipipipE 8-34

In Equation 8-34, the probability p(i) is calculated only within the class.

Moment Preservation

In moment preservation, moments are first calculated for the original image. Next, they

are calculated for images thresholded by every grey value. The threshold value at which

the original and the thresholded images have closest moments is the optimal threshold.

The total moment of the thresholded image is given by

∑∑+==

×+×255

1

4

0

4 )()(Ti

T

i

iipiip 8-35

In Equation 8-35, the probability p(i) is calculated only within the class.

Minimum Error Thresholding

This method assumes that the histogram is composed of two normally distributed classes

of pixel grey values. Two normal distribution curves are determined by an iterative

process to fit the two classes of pixels in the histogram and minimize a specified

classification error. A prospective threshold value is tested on each iteration by

166

calculation of the means and the variances from the histogram for the two classes

separated by this threshold. The threshold minimizing the error is the optimal threshold.

Region Averaging

Region averaging is a local adaptive technique. It is very useful for image with varying

background. It is implemented by first calculating a running region average, by

comparing each pixel value to its local average, and setting that value to either the

foreground, if it is much above the average, or to the background, if it is much below. In

cases in which the difference between the pixel value and its local average is small, the

pixel value stays unchanged. The region size for calculating the local average should

reflect the size of expected foreground features; it should be large enough to enclose a

feature completely, but not so large as to average across background non-uniformity. The

choice of proper region size may be problematic when the sizes of foreground features in

the image vary dramatically.

Subimage Thresholding Subimage thresholding is desirable for images with a high degree of background non-

uniformity. Depending on the degree of background non-uniformity, a N x M image is

partitioned into N/n x M/m subimages of size n x m. Optimal thresholds are determined

within each subimages using global thresholding techniques. Any subimage with a small

measure of class separation is considered to contain only one class, and its threshold is

taken as the average of the thresholds in the neighboring subimages. Finally, the

subimage thresholds are interpolated among subimages for all pixels and each pixel is

binarized with respect to the threshold at the pixel.

167

Arithmetic-based operations

This is a group of operations that are not very widely used. Mathematically they are given

by

)),((),( yxoyxn φ= 8-36

where φ is an abstraction of one of the following arithmetic operations including: add a

constant, subtract a constant, multiply a constant, divide a constant, min, max, binary OR,

binary XOR, Gamma, log, square, square root and reciprocal and so on. For addition and

subtraction operation, the effect is the same as brightness adjustment; for multiplication

and division, the effect is similar to contrast stretching or shrinking. Other operations are

not used often and will not be discussed here.

Geometric Operators

Geometric operators change each pixel value according to a function which takes into

account the neighborhood of the pixel. The function, which could be linear or non-linear,

is called a filter. There are two types of filters: smoothing filters which remove additive,

impulsive and speckle noise, and sharpening filters which remove motion induced or

defocused blurs and sharpen edges. For most of the filters, a neighborhood window with

an appropriate size is chosen. The shape of window can be square, rectangular, plus ‘+’

or cross ‘x’.

Mean filter

The grey value of a pixel is assigned to the mean value of its neighbors. A mean filter is

used to smooth and remove noise. Mathematically it is given by

168

Nbd

jimyxn yxNbdji

∑∈= ),(,

),(),( 8-37

where Nbd is the number of neighborhood pixels of pixel (x,y),

Nbd(x,y) is the neighborhood of pixel (x,y),

M(i,j) is the grey value of pixel (i,j).

For a neighborhood of k x k, the filter length is k. The rule of thumb for practical mean

filters is to set filter length k=2D+1 pixels to blur objects with diameter D or smaller.

This filter of length k will reduce any feature of characteristic size small than D.

Conversely, if one wished to retain all image features of diameter D or above, a suitable

choice would be k <= 2D-1.

The fact that the mean filter is limited to the adjustment of its length to control

performance has some drawbacks. First, while speckle noise may be reduced

substantially, important image features such as edges and textures may be equally

affected and thus blurred if features and noise exhibit similar degrees of acuity. This is a

fundamental problem of all low-pass filtering, but it is particularly severe for the mean

filter. Closely related is a second problem of the mean filter, namely its propensity to

introduce a ringing distortion. These problems can be prevented by use of a filter whose

coefficients, rather than exhibiting a sharp drop, gradually decrease to zero at the edges as

the weighted mean filter and Gaussian filter do.

Weighted mean filter

The weighted mean filter is different from mean filter in that the grey value of a pixel is

assigned to the weighted mean value of its neighborhood pixels rather than the mean

169

value of its neighbors. A neighboring pixel of a pixel is weighted according to its spatial

distance from the pixel. The effect of weighted mean filter is smoothing and removing

noise. But the smoothing effect is less than mean filter in general. The filter is given by

Nbd

jimjiwyxn yxNbdji

∑∈

×= ),(,

),(),(),( 8-38

where w(i,j) is the weight assigned to pixel (i,j).

Mode filter

The mode filter is non-linear filter according to order statistics. The grey value of a pixel

is replaced by the grey value of its most common neighbor. This is given by

)),(),(|),((emod),( yxNbdjijimyxn ∈= 8-39

The mode filter is suitable to remove isolated noise.

Median Filter

The median filter is a non-linear filter according to order statistics. The grey value of a

pixel is replaced by the median value of its neighbors. The filter is given by

)),(),(|),((median),( yxNbdjijimyxn ∈= 8-40

The advantage of the median filter is that it removes impulsive noise while preserve

edges. The median is a more robust average than the mean filter because a single very

unrepresentative pixel in a neighborhood will not affect the median value significantly.

Since the median value must actually be the value of one of the pixels in the

neighborhood, the median filter does not create new unrealistic pixel values when the

170

filter straddles an edge. For this reason the median filter is much better at preserving

sharp edges than the mean filter.

It is computationally expensive since it needs sorting the neighbors. When the window

size k is big, the sorting is even more time-consuming.

Closest of Minimum and Maximum

The grey value of a pixel is set to the minimum or maximum of its neighbors depending

on which one is closest to its grey value. This filter can sharpen boundaries between

textures. It is given by

)),(,|),()),(max(||,),()),(min(min(|),(

yxNbdjiyxojimyxojimyxn

∈−−=

8-41

Minimum

This filter does grayscale dilation by replacing the grey value of each pixel in the image

with the smallest grey value in that pixel's neighborhood.

)),(),(|),(min(),( yxNbdjijimyxn ∈= 8-42

This filter increases the size of objects which are darker than background.

Maximum

This filter does grayscale erosion by replacing the grey value of each pixel in the image

with the largest grey value in that pixel's neighborhood.

)),(),(|),(max(),( yxNbdjijimyxn ∈= 8-43 This filter decreases the size of objects which are lighter than background.

K-Nearest Neighbors

171

The grey value of pixel (x,y) is set to the average of the k pixels in its neighbors whose

values are closest to o(x,y). A typical value of k =6 is chosen for 3 x 3 square filter.

k

jimyxn yxKNNji

∑∈= ),(,

),(),( 8-44

where KNN(x,y) is the k nearest neighbors of pixel (x,y).

K-Nearest Neighbors filter can smooth image while preserving edges.

Unsharp Masking

Unsharp masking consists of a series of operations. First, the original image is blurred.

Next, a mask is obtained by subtracting blurred image from the original image. Finally

the mask is added to the original image forming the resulting image. Unsharp masking

can enhance small features while large features are suppressed. Mathematically unsharp

masking is given by

))),((),((),(),( yxobluryxoyxoyxn −+= η 8-45

where η is scaling factor.

There are two control parameters in unsharp masking: the window size for blurring

operation and the scaling factor of the mask. The larger the scaling factor, the sharper the

edge will be. The degree of blurring controls what sized edges the unsharp mask

functions on most effectively. Larger stronger blurs produce masks that alter larger edges

but miss smaller edges, while small subtle blurs capture tiny sudden edges but miss larger

edges.

Laplacian Edge Sharpening

172

In Laplacian edge sharpening, the resulting image is obtained by subtracting a multiple of

the Laplacian from the blurred image. The image is blurred by Laplacian operation

2

2

2

22 ),(),(),(

yyxo

xyxoyxo

∂∂

+∂

∂=∇ 8-46

),(),(),( 2 yxoyxoyxn ∇−= β 8-47

where β is a constant.

The advantage of Laplacian edge sharpening is that features in any direction are

sharpened equally.

Filling Operator

The filling operator is a noise filter for binary (black and white) images. Noise in binary

images takes the form of isolated ON pixels, or pixel regions, in a background of OFF

pixels, or vice versa. ON pixels are black with grey value equal zero while OFF pixel are

white with grey value 1. This type of noise is called salt-and-pepper noise, which can be

removed by filling operator. With filling operator, each isolated noise is filled in by the

grey value of its neighbor.

The most common practice of binary noise reduction is to use a filling mask. A filling

mask of 3 x 3 is sliding through the image. When the grey value of the center pixel is not

equal to the grey value of all its surrounding pixels, the center pixel grey value is

assigned with the grey value of its neighbors.

173

However, this simple technique only removes single pixel noise while larger noise

features will remain intact, particularly for those noise features at the edge of a region as

bulges or concaves.

A more general filter, called kFill, is designed to reduce isolated noise and noise on

contours up to a selected limit in size. The k of kFill refers to a size adjustment

parameter. Other kFill parameters can be set to control rounding of the filtered features.

Many shapes display 90oC corners: To preserve these, rounding must be minimized, and

the default parameters of kFill are chosen accordingly to retain corners of 90oC or

greater.

Filling operations are performed for each image pixel using a k x k window. This window

is composed of an interior (k-2) x (k-2) region, the core, and the exterior 4(k-1)

neighborhood pixels. All values of the core are set to ON or OFF, depending on pixel

values in the neighborhood.

Whether to fill with ON or OFF requires that all core pixels to be OFF or ON,

respectively. It also depends on three variables, n, c and r, which are determined from the

neighborhood pixels. Parameter n equals the number of ON (OFF) pixels in the

neighborhood, c denotes the number of connected groups of ON pixels in the

neighborhood, and r represents the number of corner pixels that are ON (OFF). In the

default implementation, filling occurs when the following logic conditions are met:

[ ])2)43(()43(1 =∧−=∨−>∧= rknknc where

∧ is logic AND operator, and ∨ is logic OR operator;

174

n > 3k-4 controls the degree of smoothing;

n=3k-4 ∧ r = 2 ensures that corners of < 90oC are not rounded;

c=1 ensures that filling does not change connectivity.

Convolution-based Filters

Convolution is widely used in digital image processing to perform a variety of filtering

tasks such as edge detection and Gaussian blurring. Mathematically, the convolution is

given by

∑∈−−

−−=∗=

obyaxhba

bahbyaxohoyxn),(

),(),(),(),( 8-48

where h(a,b) is the convolution kernel. From the image processing standpoint, a kernel is

a matrix whose center corresponds to the source pixel and the other elements correspond

to neighboring pixels.

The most important aspect of convolution is the choice of kernel h(a,b) (often called

filtering mask). Different kernels produce different filtering effect. Figure 8-2 shows

some examples of kernels.

⎥⎥⎥

⎢⎢⎢

−−−−−−−−

1111121111

⎥⎥⎥

⎢⎢⎢

−−− 121000121

⎥⎥⎥

⎢⎢⎢

−−−

101202101

(a) (b) (c)

Figure 8-2 Examples of Kernels

175

The 3 x 3 kernel in Figure 8-2(a) is used to increase contrast and accentuate detail in the

image. Two 3x3 convolution kernels as shown in Figure 8-2(a) and Figure 8-2(b) are

used to generate vertical and horizontal derivatives for Sobel edge detector.

Gaussian Smoothing

Gaussian smoothing is a 2-D convolution operation for blurring images and removing

detail and noise. A 2-D Gaussian kernel (Figure 8-3) is used to approximate 2-D

Gaussian distribution with mean of zero (as described Equation 8-49). Gaussian

smoothing is similar to weighted mean filter with the average weighted more towards the

value of the central pixels. This is in contrast to the mean filter's uniformly weighted

average. Because of this, a Gaussian filter provides gentler smoothing and preserves

edges better than a similarly sized mean filter [122]. When σ is infinite, Gaussian filter

becomes mean filter.

2

22

222

1),( σ

πσ

ba

ebaH+

−= 8-49

⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢

245424912945121512549129424542

1151

Figure 8-3 Normalized Gaussian kernel with σ =1.4

For a Gaussian filter, the filter length is chosen in accordance with the diameter D of the

features to be smoothed. The rule of thumb is to set σ = (2D+1)/2 and to choose k >

176

2D+1. In practice, either k=2D+3 or k = 2D+5 is common. A larger value of k ensures

less ringing, but require more computation.

The Gaussian filter requires more computation time than a mean filter to accomplish the

same amount of smoothing, given that the Gaussian filter exceeds the size of the

equivalent uniform filter and contains coefficients of several values and hence requires

multiplication (in contrast to simple addition for the mean filter) for performing the

convolution operation.

Edge Detector

The objective of edge detector is to find the edges of objects in an image. With the

detection of edges of an object, the object can be isolated and further analysis can be

performed. Many edge detectors use the idea of calculating spatial gradient on an image.

These edge detectors include Robert cross, Sobel edge detector, and Canny edge detector.

Canny edge detector is considered to be an optimal one. The Canny operator works in a

multi-stage process. First of all, the image is smoothed by Gaussian smoothing. Then a

simple 2-D first derivative operation is applied to the smoothed image to highlight

regions of the image with high first spatial derivatives.

Mathematical Morphological Operators

Morphological operators refer to a class of spatial filtering operations that are applied to

change the shape of a region in a binary image. That is, morphological operations replace

the binary value at a single pixel location by a value computed from the pixels values

within a neighborhood of chosen shape and size around that location.

177

The most basic morphological operations are erosion and dilation. Erosion is the

reduction in size of ON regions. (ON-valued pixel is black, and OFF-valued pixel is

white). This is more readily accomplished by iterative peeling of single-pixel layers from

the outer boundary of all ON regions. Dilation is the opposite process. Both operations

are usually applied iteratively to erode or dilate by many layers. An example for the

application of erosion is the removal of a layer, one or two pixels wide, of noisy

boundary pixels in a binary image. The analogous dilation process can be applied to fill

holes or to join disconnected lines with gaps up to two pixels in width.

Morphological operations are commonly used for noise reduction and feature detection,

with the objective that noise be reduced as much as possible without eliminating essential

features.

Morphological operations need a mask called structuring element. Structuring element is

similar to a filter mask S(i,j) of size K1 x K2 in which the coefficients take binary values.

Unlike filter masks, a structuring element need not be symmetric, and its origin must be

explicitly labeled. We apply a morphological operation to a pixel by first placing the

designated origin of the structuring element S at that pixel. S defines a neighborhood

Nbd(x,y) of the pixel (x,y).

Erosion

When a structuring element is put over a region, an ON pixel at the origin is set to OFF if

the structuring element does not completely overlap ON-valued pixels of this region.

Erosion is given by

178

⎪⎪⎩

⎪⎪⎨

∈=−−

=

=

otherwiseOFFyxNbdjiallfor

ONyjxiSjiIANDONyxIifON

yxI),(),(

),(),(),(

),(I 8-50

Dilation

When a structuring element is put over a region, an OFF pixel at the origin is set to ON if

any of the structuring element overlaps ON pixels of the region.

⎪⎪⎩

⎪⎪⎨

∈=−−

=

=

otherwiseOFFyxNbdjianyfor

ONyjxiSjiIORONyxIifON

yxI),(),(),(),(

),(

),(I 8-51

Opening and Closing

The opening operation involves the application of erosion followed by dilation. The

effect of using a square- or disk-shaped structuring element for opening is to smooth

boundaries, to break narrow isthmuses, and to eliminate small noise regions. The closing

operation is to smooth boundaries, to join narrow break, and to fill small holes. Opening

is used when the images has small noise regions, not used for narrow regions where there

is chance that the initial erosion might disconnect regions. Opening also will eliminate

long or thin features. Closing is used when a region has become disconnected and there is

a need to restore connectivity. It is not used when different regions are located closely

such that the first iteration of dilation might connect them.

Other morphological considerations and operations

179

The choice of the size of the structuring element S and that of the number of iterations

represents a trade-off between the feature size and the number of required iterations. The

larger the size, the fewer the number of iterations, but more computation will be required

on a single iteration. The most common sizes range between 3 x 3 and 5 x 5. The size of

S should be no larger than the features.

Opening tends to eliminate ON-valued noise, while closing will reduce OFF-valued noise

(holes) but smooth sharp features such as corners. Preliminary testing on representative

images is usually necessary to determine a compromise between noise reduction and

feature retention.

Non-uniform Illumination Correction

Nonuniformities in scene illumination are very common in images. Significant spatial

nonuniformities in the illumination introduce bias into the intensity histogram and in turn

interfere with subsequent global image processing operators such as binarization.

Nonuniform illumination correction can be achieved by applying an operation known as

flat fielding. This involves the division of the original by the reference image showing the

slow background variations (but not the objects of interest). Unless otherwise available,

for example, in the form of a separately stored image of the scene background, a

reference image is often created from the original by low-pass filtering such as Gaussian

smoothing with relatively large filter length.

180

Appendix III The Nelder-Mead Simplex Method The Nelder-Mead simplex method was introduced by Nelder and Mead for function

minimization [123] for a system with many variables. It is a systematic direct

optimization method very suitable for problems with a large search space such as

experiments involving many variables. It has advantages over traditional optimization

methods such as the one-variable-at-a-time approach and factorial designs in that it

requires fewer experiments to reach the optimum. It makes no assumptions about an

underlying model and requires no derivatives. As a result, it is simple, easy to implement

and efficient. Simplex methods are adaptive and evolutionary in the sense that they are

sequential, subsequent trials move away from the poorest performance vertices and move

in the direction of improvement. Its progress is based upon a relative ranking of the

values of the objective functions for different values of the parameters.

The simplex method was originally mainly used in analytical chemistry [124]. Over the

years, it gained popularity for various technical system optimization problem such as

process control [125] . There are basically two broad versions of simplex methods: basic

simplex method by Spendley et al and modified simplex methods [125]. Modified

simplex methods are further divided into modified simplex ( Nelder-Mead simplex),

super modified[126], weighted centroid [127] and composite modified [128]. In this

appendix, emphasis will be on the basic simplex method and the Nelder-Mead simplex

method.

Basic Simplex Method

181

A simplex is a geometric figure having a number of vertices equals to one more than the

number of variables in a system, that is, a simplex is defined by k+1 vertices for k

variables. In an optimization process, each vertex represents a set of values for all

variables.

For a two-variable optimization problem, the basic simplex method starts with three

initial trials (vertices), that is, the first simplex consists of three trials. After the initial

trials, the simplex process is sequential, with the addition and evaluation of one new trial

at a time. The simplex evaluates the response results of the trials that are included in the

current simplex, and searches systematically for the best values of the variables for the

next trial, which is the reflection of the worst trial against the geometric centroid of the

best and next-best trials. The optimization stops when the optimization objective is

achieved, or the objective function can not be improved further. A two-variable simplex

optimization is illustrated in Figure 8-4.

The search for optimum variable values follows the rules below:

i. The vertex (trial) with the least favorable response value in the current simplex is

rejected. A new vertex with a new set of variable values is computed by

reflection into the variable space opposite the undesirable result. The new vertex

(trial) replaces the least favorable vertex in the current simplex while other

vertices are retained. As a result, a new simplex is formed. The new vertex will

be evaluated and a new least favorable response in the newly-formed simplex

will be found. The process continues until an optimum response is reached.

182

ii. A rejected vertex is never revisited. This rule prevents the oscillation between

two vertices which one vertex is a reflection of the other vertex and both produce

least favorable results. The solution to this problem is to select the second least

favorable vertex and move away from it.

iii. Calculated vertices beyond the boundaries of the variables are not used to form a

new simplex. Instead a vertex with very unfavorable results is chosen, forcing the

simplex to move away from the boundary.

In this research, each vertex represents a set of parameter values for an IQ operator.

Modified Simplex Method (Nelder-Mead) The modified simplex method is very similar to basic simplex method. However it differs

from the basic simplex method in being capable of adjusting the simplex shape and size

depending on the results in each step. Thus it is also called the variable-size simplex

method. Two rules are added in the modified simplex method:

i. Expand in a direction of more favorable conditions

ii. Contract if a move was taken in a direction of less favorable conditions.

A graphical illustration of modified simplex method is shown in Figure 8-5.

183

Figure 8-4 Schematic Diagram of Basic Simplex Method

Figure 8-5 Schematic Diagram of Modified Simplex Method The modified simplex method is illustrated in the flow chart (Figure 8-6). The labels used

are the same as in Figure 8-5.

The different projection vertices away from the worst vertex W are calculated as follows:

)( WCCR −+= α 8-52

60% 70% 80%

90%

Variable 1

Var

iabl

e 2

W

R

E

C+

C-

Where W is the least favorable vertex, R the reflection vertex, E is the expansion vertex, C+ is the positive contraction vertex and C- is the negative contraction vertex.

184

)( WCCE −+= γ 8-53

)( WCBCC −+=+ + 8-54

)( WCBCC −−=− + 8-55

where

W is the worst vertex;

C is the centroid of all the vertices in the simplex except the least favorable

vertex W, i.e., the average value for the remaining vertices;

α is the reflection coefficient (default to 1);

β+ is the positive contraction coefficient (default to 0.5);

β- is the negative contraction coefficient (default to 0.5);

γ is the expansion coefficient (default to 2).

Transformation of Constraints in the Utilization of Simplex Optimization Simplex methods are mostly used to optimize problems without constraints. That is, there

are no constraints on variables. However, this condition is often not true for real problem.

Thus, some measures are to be taken to deal with constraints. In this research, the

parameters of IQ Operators have certain constraints, that is, they have finite ranges. In

addition, most parameters are integer type. Because of that, the rounding of parameter

values is sometimes required.

In this research, a transformation is performed to mislead the search algorithm into

searching over constrained regions of the parameter values in an unconstrained way. This

185

is done by transforming the finite range of parameter values desired to infinite ranges for

the search algorithm. The transformation is applied as follows:

1. The search guessed the values of the parameter Pi’’ over the range -∞ to ∞.

2. Each Pi’’ guessed is transformed to a value of Pi’ in the range 0 to 1 using the

following equation.

""

"

'

ii

i

PP

P

iee

eP−+

= 8-56

3. Each Pi’ is transformed to a value in the desired range Pi,min to Pi,max using:

)( min,max,'

min, iiiii PPPPP −+= 8-57

Simplex Optimization Stopping Criteria When either of the following two conditions is met, the algorithm stops:

1. negative contraction produces a result greater than that of the reflection vertex;

2. positive contraction produces a result greater than that of the reflection vertex;

186

Figure 8-6 Flow Chart of Modified Simplex Algorithm

Objective Function The objective function plays a crucial role in simplex method because it is used to

evaluate the performance of a set of values for all variables (a vertex). For an

optimization problem with only one response variable, the choice of objective function in

general is an easy problem. However in most practical optimization problems more than

one response variable must be considered at the same time [125]. There is no simple

solution to objection functions for multiple response variable problems because the

187

response variables are often in different scales, the significance of different response

variables differ, and the objectives for different response variable vary (i.e., some

response variables are to be maximized while others are to be minimized). In addition to

these difficulties, the description of optimization objectives is usually vague and

uncertain in the sense that the definitions of optimization objectives are not “black and

white”. For example, it is hard to say that a response of 100 is good but 99 is bad.

Despite the difficulties described above in determining objective functions, there are

some common methods of dealing with them. One common practice is the utilization of

fuzzy set theory with membership functions to form an optimization objective function

[125]. The approach is to form an aggregated joint response measure by combining

different response variables with different optimization objectives. The idea of

introducing fuzzy membership functions is to translate different response variables into a

measure that can be adequately compared and combined with others.

Other Considerations on the Utilization of Simplex Method Simplex methods are mostly used to optimize problems without constraints, that is, there

are no constraints on variables. However, this condition is often not true for real physical

problems. Thus, some measures are to be taken to deal with constraints. It is possible that

the simplex method converges to a non-optimal point even for simple problems [129]. It

was demonstrated by Barton et al. [130] that the choices of expansion and contraction

coefficients has an impact on the performance of simplex methods.

188

Appendix IV Modified MaxMin Thresholding The MaxMin thresholding method was developed by Torabi and is described in section

2.2.2.1. The threshold is the one giving the maximum minimum particle size as expressed

in Equation 8-58.

))((:1:0 ijnikj

AMinMaxT==

= 8-58

where T is the selected threshold value and Aij is the area of the ith particle visible in the

image using the jth value of threshold. For each jth value of threshold, the Minimum

particle size is found. The total number of threshold values examined, k, is set to 220 in

Torabi’s research.

The MaxMin thresholding was successful in determining the threshold value required to

separate the background from the particles in an image. However, here it was found to be

computationally very expensive: it required about 220 iterations (about 3 seconds) to

determine the threshold. This 3 second delay was about 90% of the total time required for

processing an image and was significant when hundreds of images were to be processed.

With the objective of reducing this delay time, in this research the following

modifications were made to the original Torabi Max Min method:

(a) Since, in this work, Image J provided a histogram of number of pixels versus grey

level it was possible to identify the actual minimum grey level in an image instead

of using zero as Torabi was obliged to. This reduced the range of possible “best”

grey level values.

189

(b) The highest “best” grey level of the image was taken to be the median grey value

of the image instead of the very high value of 220 specified by Torabi. It was

based upon the idea that the number of pixels belonging to particles accounts for

less than 5% percent of the total pixels of an image and the grey values of

particles are less than the average grey value of background which itself could be

represented by the median grey value of the image. The median grey value of an

image in most cases is below 220, which means that the maximum possible

“best” threshold value in Equation 8-58 will be below 220. Thus, like the first

modification, this change also narrowed the range of possible thresholds to

search.

(c) The step size for the threshold values used in the search was increased. The step

size is the interval between a preceding threshold value and the next value

examined. In Torabi’s work, the step size of the iteration was set to 1, i.e. the

search for the best threshold value examined every possible threshold values

between the starting threshold value (0 in Torabi’s work) and the ending

threshold value (220 in Torabi’s work). In this work, the step size is set to 5, i.e.,

if the starting threshold value is 100, the next threshold value to be examined is

105. Figure 8-7 shows an example image. The results of thresholding of it using

different step sizes are shown in Figure 8-8, which is a plot of minimum particle

size versus threshold. From the curve, the threshold giving the maximum

minimum particle size can be located. The peaks in the curves represent the

maximum minimum particle size in certain regions. As can be seen, different step

size will generate different curves. The curve for step size of 1 represents

190

thresholding results of all possible thresholds and accordingly it acts as

benchmark to verify if the use of other step sizes would be able to locate the true

peaks. It is observed that the application of other step sizes is unable to locate the

first peak though the second peak is located at step size of 5. However, what is

important is that the rough locations of major peaks are correctly identified by all

step sizes. This was very important because a further fine-resolution search could

be centered around the peaks to define the true peaks. In this work, it was

determined empirically that the step size of 5 should be sufficient to locate the

peaks.

The peak location-search using a larger step size provided the approximate location of

major peaks and ignored much smaller peaks. An assumption was made that the first two

peaks would be where the global threshold giving the maximum minimum particle size is

located. The assumption was well-founded in that:1) real particles tended to be darker,

have a low grey value and therefore they will turn out with a low threshold; 2) the use of

the first two peaks could avoid the situation that the first peak is noise, i.e., if the first

peak is noise, there is still the second peak. It is highly unlikely that both of the first two

peaks would be noise.

191

Figure 8-7 An Example Image

Minimum Particle Size Vs Threshold

0.00E+001.00E-052.00E-053.00E-054.00E-055.00E-056.00E-057.00E-058.00E-059.00E-051.00E-04

115 125 135 145 155 165

Threshold

Min

imum

Par

ticle

Siz

e

step size = 1 step size = 2 step size = 5 step size = 6

Figure 8-8 The Effect of Thresholding Step Size on Minimum Particle Size

192

Based on this assumption, a fine-resolution-search was carried out around a peak to

locate the local threshold which provided the maximum minimum particle size. The fine-

search method used in this work was the bisection algorithm. The modified search is

illustrated in Figure 8-9.

To describe how a modified novel search algorithm works, we use function min(x) to

represent the minimum particle size at threshold x. From Figure 8-9, we have

min(a)<min(b) and min(c)<min(b). With the assumption that the maximum minimum

particle size within [a,c] and provided that the curve between threshold a and c is convex,

the search will be performed as follows:

Step 1: Threshold the image using midpoint threshold (a+b)/2 and find min((a+b)/2).

Similarly find min((b+c)/2).

Step 2: Examine min((a+b)/2) and min((b+c)/2).

If min((a+b)/2) > min(b) and min(b) > min((b+c)/2), the maximum minimum particle

size (the true peak) must be within [a, b], so replace b with (a+b)/2 and replace c with b.

If min((b+c)/2) > min(b) and min(b) > min((a+b)/2), the peak must be within [b, c], and

therefore replace b with (b+c)/2 and replace a with b.

If min(b) > min((a+b)/2) and min(b) > min((b+c)/2), the peak must be within

[(a+b)/2,(b+c)/2], and therefore replace a with (a+b)/2 and replace c with (b+c)/2).

Step 3: If b-a =1 or c-b=1, b is the threshold giving the maximum minimum particle size

and search stops, otherwise return to the Step 1.

193

Minimum Particle Size vs Threshold

0.00E+001.00E-052.00E-053.00E-054.00E-055.00E-056.00E-057.00E-058.00E-059.00E-051.00E-04

110 120 130 140 150 160

Threshold

Min

imum

Par

ticle

Siz

e

a

b

c

(b+c)/2(a+b)/2

Figure 8-9 Scheme of Modified Search for Threshold

The Bisection search was performed on the first two peaks separately and there a

threshold for each peak giving a local maximum minimum particle size was identified.

The threshold specifying the maximum minimum particle size was accepted as the global

threshold.

The above modified MaxMin method was tested by applying it to 249 randomly selected

images. Results are tabulated in Table 8-3. There it can be seen that for 91.2% of the

images difference between the threshold selected by Modified MaxMin thresholding and

by the original MaxMin thresholding is less than or equal to 3. However, the modified

MaxMin thresholding greatly reduced the number of iterations required to search for the

global threshold with an average of 12 iterations compared to the original f 220,: this was

a reduction of 90%.

194

Table 8-3 Threshold Test of Modified MaxMin Thresholding Number of Images Percentage The threshold of bisection MaxMin thresholding is the same as that of original MaxMin Thresholding.

171 68.7%

The difference between the threshold of bisection MaxMin thresholding and that of original MaxMin Thresholding is within 3.

56 22.5%

The difference between the threshold of bisection MaxMin thresholding and that of original MaxMin Thresholding is more than 3.

22 8.8%

Thus, it appeared that there was very little difference in the threshold value selected by

the modified max min method and a large decrease in the umber of iterations required.

Next the effect on classification performance was examined.

A set of 745 images (240 WOs and 505 WPs) in off-line image modification as training

images were selected and thresholded using both MaxMin thresholding and Modified

MaxMin Thresholding. The classification results for the two thresholding methods are

tabulated in Table 8-4 and Table 8-5 respectively. As can be seen, there are 29 false

negatives and 49 false positives for MaxMin thresholding versus 16 and 25 for Modified

MaxMin thresholding, respectively. Thus, it was concluded that the Modified MaxMin

thresholding in fact improved the classification accuracy compared to the original

MaxMin thresholding. The classification error rates for the two methods are illustrated in

Figure 8-10, which shows that the error rates for Modified MaxMin thresholding reduced

in all aspects including over error rate, false negative rate and false positive rate as

compared to those of the original MaxMin thresholding. Modified MaxMin thresholding

has an overall error rate of 5.5% over 10.5% of the original MaxMin thresholding,

indicating a 5% improvement.

195

An ROC analysis as illustrated in Figure 8-11 shows that the classifier obtained when the

modified MaxMin thresholding was applied was much superior to that of the original

MaxMin thresholding: the ROC for the former method is always to the left and

superimposed on the latter. The AUC (area under the ROC curve) for the classifier

obtained using Modified MaxMin thresholding is 0.974 compared to 0.924 for that

obtained using the original MaxMin thresholding. The reason for the classification

accuracy improvement was that Modified MaxMin thresholding was able to ignore some

noise at high grey levels which could have been included as particle information in the

case of the original MaxMin thresholding. That is, the modified MaxMin thresholding

was more resistant to noise at high grey levels.

The above analysis demonstrated that, in addition to vastly improving computational

speed by reducing the number of trial threshold values, the modified MaxMin

thresholding significantly improved the classification accuracy.

Table 8-4 Classification Confusion Matrix for MaxMin Thresholding

Predicted Class is WP Predicted Class is WO Actual Class is WP 476 29 Actual Class is WO 49 191

Table 8-5 Classification Confusion Matrix for Modified MaxMin Thresholding

Predicted Class is WP Predicted Class is WO Actual Class is WP 489 16 Actual Class is WO 25 215

196

10.5%

5.7%

20.4%

10.4%

3.2%

5.5%

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

Overall Error Rate False Negative Rate False Positive Rate

Erro

r Ra

te

MaxMin Thresholding Modified MaxMin Thresholding

Figure 8-10 Comparison of Classification Error Rates for two Different Thresholding Methods

0.0%10.0%

20.0%30.0%

40.0%50.0%

60.0%70.0%

80.0%90.0%

100.0%

0.0% 20.0% 40.0% 60.0% 80.0% 100.0%

False Positive Rate

True

Pos

itive

Rat

e

Modified MaxMin Thresholding Bayesian ModelMaxMin Thresholding Bayesian Model

Figure 8-11 ROC Curves for MaxMin Thresholding and Modified MaxMin Thresholding

197

Appendix V ADAPTIVE MACHINE LEARNING METHODS

The Intelligent Learning Machine (ILM) The ILM is a newly invented incremental machine learning algorithm by Sayad [20]. It is

incremental, on-line and in real-time. The core of its intelligence is a customizable ILM

Weight Table (IWT) learned from the data. The framework of ILM is shown in Figure

8-12.

Data

Learner

PredictorModeller

User Interface to Choose a Model and Decide to Update

The ILM Weight Table (IWT)

Figure 8-12 The Intelligent Learning Machine (ILM) As shown in Figure 8-12, the learner builds the IWT from raw data, the modeler selects

certain required elements of the IWT to build a model which will be used for prediction.

The IWT acts as a channel between the learner and the modeler. The separation of the

learner and the modeler provides an open architecture allowing different learning

methods to be incorporated into the ILM. In contrast to other learning algorithms, the

modeler and the learner in the ILM are separated, which is one of the prominent

advantages that the ILM presents. Because of this separation, the ILM can adopt many

learning techniques including Linear and Non-linear Regression, Linear and Non-linear

Classification, Bayesian Models, Markov Chain, Hidden Markov Models, Linear

198

Discriminant Analysis, Association Rules, OneR rule, Principal Component Analysis,

and Linear Support Vector Machines. Furthermore, ILM is not limited to these

techniques.

The IWT, as the core of ILM, is a two dimensional table (Table 8-6). In this table, n is the

number of independent variables and m is the number of dependent variables. The basic

unit of ILM as shown in Table 8-7 is an independent variable Xj an dependent variable Xi,

and a weight Wi j. The Wi j contains four elements, namely, the number of data values, the

sum of variable Xj, the sum of variable Xi, and the sum of multiplication of variable Xi and

Xj, but other elements can be added to ILM if necessary.

Table 8-6 The General Structure of ILM Weight Table

X1 … Xj … Xn

X1 W11

W1 j

W1 n

… … … … … …

Xi

Wi1

Wi j

Wi n

… … … … … …

Xm

Wm1

Wm j

Wm n As shown in Table 8-7, the basic unit of the Knowledge Table includes a dependent

variable Xi , an independent variable Xj and a weight function Wij:

Table 8-7 The Basic Unit of the ILM Weight Table

Xi

Xj Nij ∑xi ∑xj ∑xij

199

Values of Xi and Xj are denoted xi and xj respectively. Wij consists of the following four

basic elements:

• Ni j is the total number of joint occurrences of xi and xj data values in the dataset

• Σ xi is the sum of values for variable Xi (i.e. xi) in the dataset with the summation over

Ni j the number of records containing the values.

• Σ xj is the sum of values for variable Xj in the dataset with the summation over Ni j the

number of records containing the values.

• Σ xi xj is the sum of multiplication of values for variables Xi and Xj in the dataset with

the summation over Ni j the number of records containing the values.

The structure of IWT allows the following actions to dynamically change the content of

IWT:

• learning, to add new data record into the IWT;

• forgetting, to remove data records from the IWT;

• growing, to add new variable into the IWT;

• contracting, to remove variables from the IWT;

• interacting, to extract certain part of the IWT for modeling;

• co-operating, to join other IWTs with the same structure.

These IWT actions provide the flexibility for various modeling needs.

The incremental, online and real-time characteristics of ILM dramatically reduce the

computation load in terms of adding new data records to the model, removing data

records and modeling change with the addition or removal of variables. In addition, the

co-operation feature of IWT allows parallel and distributed processing of large and

200

complex datasets. This feature along with the incremental learning nature presents ILM

with full potential for real-time process monitoring and pattern recognition.

Application of ILM to Naïve Bayesian Model for Image Classification In this work the Bayesian model was used to estimate the probability that an image is

With Particle (WP) or Without Particle (WO). The estimated probability was calculated

based on two attributes of an image, i.e., “mean grey value” and “percentage area”. These

attributes are the “Xi” in the Knowledge Table as in Table 8-6. The decision whether an

image is WO or WP is thus determined by the two estimated probabilities to be the class

giving the higher probability. Please refer to chapter 2 for details of calculating

probability.

For a Naïve Bayesian model, only the diagonal cells as in Table 8-6 are needed to

estimate the desired WP or WO probability since the other cells representing interaction

between attributes are not concerned and ignored. In this research, to applying the ILM

to the Naïve Bayesian classification model, two Knowledge tables (IWT) are needed, one

for calculating the probability of WP and the other for calculating the WO probability.

The WP table contains data of image known to be WP and WO table for images known

to be WO.

Table 8-8 shows a Knowledge Table in this work. It is noted that only diagonal cells are

present. N1 and N2 are the number of occurrences of each attribute. In this work, since

there is no missing value of attribute for all the images. N1 and N2 are, in fact, the

number of images (N) used to create the IWT. In the cases where there are missing

201

attribute values, Ni could be different for different cells of the Knowledge Table and

would need to be calculated accordingly.

Table 8-8 ILM Knowledge Table for N images

X1 X2

X1 N1 Σx1 Σx1

2

X2

N2 Σx2 Σx2

2

The calculation of mean and standard deviation of an attribute (using the first attribute

as an example) is given in Equation 8-59 and 8-61.

21 NNN == 8-59 (1)

8-60

8-61 In the above equations, subscript 1 is designated to be the indicator of first attribute and

subscript i is designated to be the ith images.

One of the major advantages of the ILM is its efficient incremental learning capability.

Thanks to the assumption of attribute independence, Naïve Bayesian model is

intrinsically incremental. However, practically taking advantage of this property

efficiently is not as apparent as it seems. The ILM enables an extremely efficient and

systematic way of merging new data into a Bayesian model by aggregating historical data

N

xN

ii∑

== 11μ

21

11

21

1

211

21

)()(

N

x

N

x

N

xN

ii

N

ii

N

ii ∑∑∑

=== −=−

σ

202

mathematically into the forms of summation for easy creation of different models without

revisiting and recalculating the previous data when new data becomes available and need

to be incorporated. This is manifested when a new value of a particular attribute arrives.

In this scenario, the new value is simply added to the corresponding summation of a cell

in Knowledge Table (IWT) as shown in Equation 8-62~64, and the new summation is

used to update the model of calculating the Bayesian probabilities. Equations 8-62~66

shows the calculation needed when a new attribute value x1,new of an image is added. The

update of other attributes in the IWT is exactly the same.

1+= NN New 8-62

new

new

N

N

ii

N

ii xxx 1

111 ∑∑

=

+= 8-63

21

1

21

11 new

new

N

N

ii

N

ii xxx ∑∑

==

+= 8-64

The calculation of the updated mean and standard deviation is therefore given by the

following equations:

New

N

ii

New N

xnew

∑=

1

,1μ 8-65

⎟⎟⎟⎟

⎜⎜⎜⎜

−⎟⎟⎟⎟

⎜⎜⎜⎜

=∑∑

2

21

21

2,1

)(

New

N

ii

New

N

ii

New N

x

N

xnewnew

σ 8-66

203

As can be seen, the calculations involved in incremental learning using the ILM is at a

minimum level compared to other popular incremental methods. This property is very

suited to a real-time system because it saves significant computation time. This is often

very critical for a system which needs to be very responsive system in an environment

where large amount of are generated at a very high speed.

Incremental Support Vector Machine (ISVM) While the Support Vector Machine (SVM) has been widely studied, the study of ISVM

began just a few years ago [22, 131-138]. One of the known issues with the SVM is high

computation complexity since training a SVM requires solving a quadratic programming

(QP) problem with a number of coefficients equal to the number of training examples

[134]. QP is computationally intensive, takes a long time to converge and requires

memory. Thus for a large dataset, especially in real-time, standard numeric techniques for

QP are infeasible and generate a scalability issue for SVM. When adding new data

records to a large training set, an efficient ISVM becomes necessary to avoid batch

retraining.

An early study of ISVM by Syed [131] suggested retraining a classifier with all support

vectors (SV) and newly added samples. The reasoning behind this is as follows: the

resulting decision function of an SVM depends only on its SVs, i.e. training an SVM on

the SVs alone results in the same decision function as training on the whole data set.

Therefore one expects to obtain an incremental result that is equal to the non-incremental

result, if the last training set contains all SVs in the non-incremental case. The empirical

204

results from this research show that ISVM performs well even though this training

scheme provides approximate results.

Xiao [133] presented a similar technique for incremental training of the SVM. In his

research, all training samples are divided into three sets, a backup set (mostly of trivial

samples), a caching set (samples frequently appearing in SV set) and a work set (the

latest SV set). With the definition of these three types of samples, a Least-Recently Used

(LRU) scheme is implemented to optimally discard trivial samples without sacrificing the

classification precision.

Both ISVM methods mentioned above are approximate approaches. An exact incremental

learning of SVM was proposed by Cauwenberghs et al. [134]. In their method, the exact

solution is constructed recursively by adding one new data point at a time while retaining

the Kuhn-Tucher conditions on all previous data. Experiment with an example data set

shows that incremental learning offer a simple and computationally efficient scheme. The

procedure can be extended to SV regression.

Peng et al. [137] proposed an active learning scheme. The scheme first interrogates the

new data point by measuring the distance between the point and the hyper-plane

consisting of the entire SV set. A threshold is then applied to the measured distances. If

the distance is smaller than the threshold, the new data point will be added to the current

SV set as new training set. Otherwise the new data point is discarded. This strategy does

achieve computational efficiency but the choice of threshold is empirical. This method in

some sense is desirable for real-time process monitoring in which data is generated very

fast.

205

An et al. [22] presented an even more aggressive incremental learning method by pre-

extracting the SV set for initial training and post-discarding training samples in an

attempt to lower the training load. The initial training phase consists of three steps. In the

first step, the pre-extracting SV set is selected based on the least distance between the

samples in two classes and this is used as a work set to obtain the classifier. Next, the

obtained classifier is applied to classify samples which are not in the training set. Those

samples whose relative distances to the SV hyper-plane are greater than a threshold will

be added to the work set while others will become training samples for the next iteration.

Finally, the process iterates to the first step until all training samples are correctly

classified. In the incremental training phase, the new data points are classified using the

current SV set as the second step in the initial training phase. The rest of the incremental

training is the same as the initial training. After the iteration terminates, some training

samples are discarded following the similar scheme proposed by Peng [137]. However,

Peng’s scheme was to discard the newly added training samples.

Although ISVM attracts much attention and has great potential, further studies are

required. ISVM so far has not been implemented in real-world applications. In

comparison, ILM has an advantage over ISVM for linear problems because once the IWT

in ILM is constructed, there is no need to work with the original raw data.

Incremental Neural Networks (INN) Artificial neural networks (ANN) have been widely used in image processing and

analysis, process control and chemistry as they are universal function approximators for

206

linear, non-linear and even unknown relationships. In this literature review, however, the

attention will be given to incremental neural networks (INN) which is more desirable for

real-time supervised or supervised clustering and classification applications. Among

INNs, fuzzy adaptive resonance theory (ART) and Fuzzy ARTMAP are emphasized.

Like other non-incremental learning algorithms, backpropagation based neural networks

encountered the problem that a classifier trained on a data set often suffers: unacceptable

classification accuracies on a new data set. From an operational point of view, such a

problem represents a serious limitation in real-world applications (e.g., automatic image

monitoring systems, robot learning) in which data sets tend to vary significantly. In this

context, the design of a robust incremental learning system capable of performing

efficiently and improving its generalization on different data sets is very desirable. Only

a few papers [24, 28, 31, 139-144] have been published that deal with the incremental-

learning NN, so it is still an open issue in the pattern-recognition literature.

The first true incremental neural network is proposed by Park et al. [31]. In his research,

a training procedure that adapts the weights of a trained layered perceptron artificial

neural network to training data originating from a slowly varying non-stationary process

is proposed. The resulting adaptively trained neural network, based on a nonlinear

programming techniques, is shown to adapt to new training data that are in conflict with

earlier training data without affecting the neural networks' response to data elsewhere. Fu

et al. [28] proposed an incremental backpropagation learning neural network (IBPLN) for

classification based on bounded weight adaptation and structural adaptation learning

rules. The main idea is to add a hidden unit (i.e., change the topology of the network). If

207

the neural network cannot accommodate the new instance through weight adaptation, and

this process is called neuron generation. A previously added hidden unit was deleted if its

output weight decayed to a predefined threshold value. This is called neuron elimination.

However, one of the drawbacks of IBPLN is that it can not adapt to a new data set

containing new classes. A constructive, incremental learning system for regression

problems is introduced by Schaal [141].

Bruzzone et al. [24] proposes an incremental Radial Basis Function (RBF) based neural

network for the classification of remote-sensing images. The proposed classifier, a three-

layered network, allows adaptation to new classes possibly contained in the new data

sets. The initial training phase follows the same two-step procedure as training a normal

RBF neural network. That is, it first obtains the prototypes by applying a clustering

procedure to the training samples of each class, and then calculates the weight matrix by

minimizing the sum-of-squares error function. In the incremental training phase, a

similarity function between a new sample and its closest existing prototype is defined. If

a sample is found to be similar to its closest prototype, then the prototype is updated;

otherwise, a new prototype is generated because the new sample considered cannot be

efficiently represented by any current existing prototypes. The generation of a new

prototype means the addition of new neuron with a new kernel function. Following the

prototype updating, the weight updating is performed again based on the error function.

This classifier is capable of performing incremental learning and approximately

preserving the ‘old’ knowledge in the incremental learning phase. Furthermore, it

exhibits the characteristics of self-organization and network topology adaptability.

208

Appendix VI Screen 3: Selection of IQ Operators by Task Specific Criteria

In Screen 3 an experimental design is used to generate images utilizing the IQ Operators

obtained from the Screen 2 selection. The objective of the experimental design is to

assess important “task specific criteria”. Task specific considerations included the effect

of an IQ Operator on task specific image quality characteristics and the task specific

computational characteristics of an IQ Operator. The latter term refers to the fact that for

in-line monitoring, calculation speed and simplicity are important characteristics.

In Screen 3 these considerations led to two criteria being used to select the most desirable

IQ Operators:

i. the effect on particle image size;

ii. the absence of interaction effects on particle image size with other IQ Operators.

IQ Operators (as shown in Table 8-9) with the low and high levels of their parameters

according to Table 5-1 were applied to various test patterns in combinations as dictated

by the statistical experimental design. Statistical experimental design analysis software

was developed and interfaced with the commercially available Image J program so that

the design could be implemented on images. In developing Screen 3, four trials were

conducted using test patterns as images and are summarized in Table 8-9. “Order of

Application” was a blocking variable. That is, each design was limited to one order of

application.

209

Table 8-9 Statistical Experimental Designs

Trial #

Type of Statistical Design

Number of Experiments

IQ Operators in Order of Application

Test Pattern

1 23 8 GER, GDIL, MD Test Pattern 1 2 22 4 MD,GB Test Pattern 2 3 23 8 BR,MD,GB Test Pattern 3 4 24 32 BR,MD,GB,UNSHP Test Pattern 4

Software to implement a new color coded, scatterplot matrix method was developed to

enable rapid assessment of the results of the experimental designs. The criterion

examined by the scatterplot matrix was the “maximum effect on task specific image

quality”. That is, of all the many different possible combinations of image operators, the

ones which result in the most significant effect on particle size were rapidly identified.

The rationale for this criterion was to identify the most efficient and effective method of

improving image quality.

Results of the four trials are described in turn below:

Screen 3: Trial 1 In Statistical Design 1 a test pattern consisting of four uniform black squares on a white

background is used (Test Pattern 1 in Figure 8-13). This trial only serves as a

demonstration of how Screen 3 works on a test pattern. Three image operators were

investigated: grayscale erosion, grayscale dilation and the median filter. The 2-Screen

factorial design of these image operators is shown in Table 8-10. The graphical

illustration of the execution of the factorial design is given in Figure 8-14. In Figure 8-14,

the root of the tree is the test pattern, and the images on the leaves of the tree are the

210

resulting images after the sequential application of three image operators on the test

pattern.

Figure 8-13 Test Pattern Image 1

Figure 8-14 Graphical Illustration of the Execution of Two-Level Factorial Design on a Test Pattern

MD high MD high MD low

GDIL high

MD high

GER lowGER high

GDIL lowGDIL lowGDIL high

MD low

MD low

MD low

MD high

211

Table 8-10 Two Level Factorial Design for an Image Operator Sequence

GER GDIL MD Results

Run 1 -1 -1 -1 1 Run 2 +1 -1 -1 a Run 3 -1 +1 -1 b Run 4 +1 +1 -1 ab Run 5 -1 -1 +1 c Run 6 +1 -1 +1 ac Run 7 -1 +1 +1 bc Run 8 +1 +1 +1 abc

Notes: -1 and +1 represent low level and high level of each image operator, respectively.

The main effects and interaction effects were calculated as follows:

• Main Effects

Effect of GER = (-1+a-b+ab-c+ac-bc+abc)/4

Effect of GDIL= (-1-a+b+ab-c-ac+bc+abc)/4

Effect of MD = (-1-a-b-ab+c+ac+bc+abc)/4

• Interaction Effects

GER & GDIL = (1-a-b+ab+c-ac-bc+abc)/4

GER & MD = (1-a+b-ab-c+ac-bc+abc)/4

GDIL & MD = (1+a-b-ab-c-ac+bc+abc)/4

A plot of two-way interaction effect of grayscale erosion and grayscale dilation on image

quality is illustrated in Figure 8-15.

212

Figure 8-15 Plot of Two-Way Interaction Effect of Grayscale Erosion and Grayscale Dilation In Figure 8-16, the vertical axis represents the image quality, and the horizontal axis

represents the image operator erosion at two levels, and the two lines shows the change in

image quality at high and low level of image operator dilation. The plot indicates the

trends of main effects and interaction effects of image operators on the image quality.

More specifically, the plot shows that particle size decreases as operator erosion level

goes from low to high. While for operator dilation, it is opposite. The two slightly

converging lines indicate that there is a weak interaction effect between image operators

erosion and dilation.

The classical scatter plot is limited to a pair of variables. Thus, when more than two

variables are involved many such plots are required to include all the possible two way

combinations. The plots can be arranged as a matrix to form a “scatterplot matrix”. This

has been done in the literature for multiple two way scatterplots but its application to the

“interaction” plots from a statistical experimental design has not previously been

published. To assess the effects of image operators on image quality using a scatterplot

matrix, it is desirable to highlight the effects by color coding each cell of the matrix

Dilation Low

Dilation High

0

50

100

150

200

250

300

350

400

0 0.5 1 1.5 2 2.5 3 3.5

Erosion

Par

ticle

Siz

e (in

pix

els)

213

according to the magnitude of the effects it shows as shown in Figure 8-16. Diagonal cell

shows the main effect of an independent variable on the response variable and the off-

diagonal cells shows the interaction effect of two independent variables on the response

variable. In this way, we are able to visually identify the major variables which have the

strongest contribution to image quality rapidly and easily. Figure 8-16 shows a newly

designed color-coded scatterplot matrix for a sequence of three operator operations. Each

scatter plot in the matrix represents one response variable and one or two independent

variables. In this case, the response variable is the image quality metric (particle size),

and the two independent variables are two IQ Operators. The horizontal axis of a scatter

plot is the IQ Operator indicated by the title of the column where the particular scatter

plot is located, and the other image operator is indicated inside that scatter plot. The

effects of image operators on image quality are color coded in the scatter plots only if the

magnitudes of the effects are above a predefined threshold. In this case, the threshold is

set to five percent of the original image quality (particle size). For the diagonal scatter

plots in the matrix, the coded colors are for main effects, while for the rest of scatter

plots, the coded colors are for interaction effect. The color is assigned to each scatter plot

according to the color bar by the right side of the matrix. Figure 8-16 shows that

grayscale erosion has a strong main effect on image quality, while grayscale dilation has

a relative weak main effect. The two-way interaction effect between grayscale erosion

and grayscale dilation exceeds five percent but it is less significant in comparison with

the main effects of grayscale erosion and grayscale dilation. All other effects are

negligible with magnitude less than five percent. With the criterion of selecting the

operators with the strongest effects on image quality, grayscale erosion and grayscale

214

dilation are selected in this case because they are the ones which most effectively and

efficiently change the image quality.

Figure 8-16 Color-coded Scatterplot Matrix for Sequence of Grayscale dilation, Grayscale erosion

and Median filter

Screen 3: Trial 2 In Trial 2 a test pattern consisting of four uniform black squares on a white background

with impulsive and white noise is used (Test Pattern 2 in Figure 8-17). Two image

operators were investigated: median filter and Gaussian Blur filter. The median filter

was chosen because there are impulsive noises in the Test Pattern 2, and the Gaussian

filter was selected due to the presence of white noise. The 2-Level factorial design of

215

these image operators is similar to that in statistical design 1. The color-coded scatterplot

matrix for the image process sequence meeting the two selection criterions mentioned

above is shown in Figure 8-18. In this case, only media filter is chosen while Gaussian

Blur is eliminated because its main effect is below the threshold of five percent

predetermined.

Figure 8-17 Test Pattern Image 2

Figure 8-18 Color-coded Scatterplot Matrix for Sequence of Median and Gaussian Filters

Screen 3: Trial 3 In Trial 3 a test pattern consisting of four uniform black squares on a white background

with impulsive and white noise is used (Test Pattern 3 in Figure 8-19). Three image

operators were investigated: brightness shift, median filter and Gaussian filter. The

selection of brightness shift is based on the fact that both image background and

216

foreground are on the bright side of grey levels. A median filter was chosen because there

are impulsive noises in the Test Pattern 3, and the Gaussian filter was selected due to the

presence of white noise. The main and interaction effect results for the image process

sequence meeting the two selection criterions are plotted in Figure 8-20. As shown in

Figure 8-20, all effects except main effect of median filter are weak. Accordingly, only

the medial filter was chosen to process the test pattern.

Figure 8-19 Test Pattern Image 3

Figure 8-20 Color-coded Scatterplot Matrix for Sequence of Brightness shift, Median and Gaussian

Filters

217

Screen 3: Trial 4

In Trial 4 a test pattern consisting of four uniform black squares on a white background

with impulsive and white noise is used (Test Pattern 4 in Figure 8-21). Four image

operators were investigated: brightness shift, median filter, Gaussian filter and the

unsharp mask. The selection of unsharp mask is due to the blurry edges of particles. The

main and interaction effect results for the image processing sequence meeting the two

selection criterions are plotted in Figure 8-22. As shown in Figure 8-22, the unsharp mask

has a dramatic main effect on image quality.

Figure 8-21 Test Pattern Image 4

218

Figure 8-22 Color-coded Scatterplot Matrix for the Sequence of Brightness shift, Median filter, Gaussian Filter and the Unsharp Mask

219

Appendix VII Case-Based Reasoning Classification

This appendix summarizes the results of an investigation into the possibility of using

Case-Based Reasoning Classification (termed CBR Classification) instead of Bayesian

Classification to assign a new image a WP or WO class label. The CBR Classification is

illustrated in Figure 8-23. With CBR Classification, the same attributes and similarity

measurement as in Table 5-18 were applied to retrieve the most similar image case in the

database. However, then the class label of this image case was retrieved and assigned to

be the class of the new testing image: image processing instructions were not used since

no image quality improvement was involved. This is the simplest way to employ CBR

Classification. Hybrid methods involving both image quality improvement and CBR

Classification could be envisioned but were not examined in this supplemental

investigation.

Figure 8-24 shows the software component developed to implement CBR Classification.

As illustrated, first a blanket image processing step including brightness adjustment and

background flattening was applied to the new input testing image. Next, the image

quality metrics and necessary attributes of particles of the new image were measured by

invoking the image measurement shared software component. The measurements were

then used by the database shared software component to locate the most similar image

case in the Reference Image Database to the new image. The class label associated with

the retrieved image case was then assigned to the new image.

220

Figure 8-23 The Schema of Case-Based Reasoning Classification

221

Locate Reference Image Closely Resembling the

Input Image

Assign to the Raw Image the Class Label of the Retrieved Image in the

Reference Image Database

Case-Based Reasoning

Classification

Measurement of Image Quality Metrics and Other

Attributes

Input Testing Image

Image MeasurementShared Component

DatabaseShared Component

Brightness AdjustmentAnd

Background Flattening

Image ProcessingShared Component

Figure 8-24 The Case-Based Reasoning Classification Component

The results of CBR Classification were obtained for the four trial image sets described in

Section 5.2.3 and compared with those of Torabi Adaptive Classification, Static IQMod

Classification and Adaptive IQMod Classification method.

Test Trial 1: the Use of a New Set of Images Produced by Torabi For this trial image set, the classification results of CBR Classification are shown in

Table 8-11.

Table 8-11 Confusion Matrix of Test Trial 1 Image Set Using CBR Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 234 2 Actual Class is WO 3 422

222

1.4%

14.8%

9.4%

0.7%0.8%0.8%

3.8%

0.5%1.7%

0.6% 0.7%0.4%

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

16.0%

18.0%

Overall Error Rate False Negative Rate False Positive Rate

Perc

enta

ge

In-line Adaptive Classification CBR ClassificationIQMod Classification Adaptive IQMod Classification

Figure 8-25 Comparison of Classification Error Rates for Test Trial 1 Using Different Classification

Methods Figure 8-25 showed that CBR Classification performed reasonably well for this trial

image set with an overall error rate of 0.8%. This is slightly higher than the 0.6% by

Adaptive IQMod Classification. The ability of the CBR Classification to adapt to image

quality variability was not examined.

Test Trial 2: the Use of the Microgel Image Set Produced by Ing For the Trial 2 image set, the classification results of CBR Classification are shown in

Table 8-12.

Table 8-12 Confusion Matrix of Test Trial 2 Image Set Using CBR Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 87 1 Actual Class is WO 0 1256

223

0.0%

11.8%

0.7% 0.0%

1.1%

0.1%0.2%

3.5%

0.0%0.1%1.1%

0.15%0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

Overall Error Rate False Negative Rate False Positive Rate

Erro

r Rat

e

In-line Adaptive Bayesian ClassificationCBR ClassificationIQMod ClassificationAdaptive IQMod Classification

Figure 8-26 Comparison of Classification Error Rates for Test Trial 2 Using Different Classification

Methods This image set was an “easy” set as all four classification methods showed excellent

overall classification error rates with the worst being 0.7% by the Torabi Adaptive

Classification. There was no statistically significant difference evident between CBR

Classification, IQMod Classification and Adaptive IQMod Classification for this image

set.

Test Trial 3: the Use of Images from New Extruder Runs Utilizing Injection of Particles with Low Additive Polyethylene Pelletized Feed The classification confusion matrices for this set of images are tabulated in Table 8-13~

16 with the aggregated classification results for four subsets of image reported and the

classification error rates are illustrated in Figure 8-27. From these tables and figure, we

see that the best overall classification error rate is 0.5% and was achieved by the

224

Adaptive IQMod Classification method. This was followed by an error rate of 8.6% from

the Static IQMod Classification method. The CBR Classification method with an error

rate of 17.9% was significantly higher than the Adaptive IQMod and Static IQMod

Classification methods but significantly lower than the Torabi Adaptive method. Not

unexpectedly, CBR Classification was far less tolerant of image quality changes than

either Adaptive IQMod or Static IQMod as it didn’t involve any image quality

improvement process.

Table 8-13 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using Torabi Adaptive Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 423 686 Actual Class is WO 0 363

Table 8-14 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using CBR Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 845 264 Actual Class is WO 0 363

Table 8-15 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using

Static IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 982 127 Actual Class is WO 2 361

Table 8-16 Confusion Matrix for Test Trial 3 Image Subset (Run 3-1 to Run 3-4 in Table 3-2) Using

Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 1101 8 Actual Class is WO 5 358

225

0.5%8.6%

46.6%

17.9%23.8%

11.5%

61.9%

0.7%1.4%

100.0%

0.6%

100.0%

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

In-line TorabiAdaptive

CBRClassification

IQMod Adaptive IQMod

Erro

r R

ate

Overall Error Rate False Negative Rate False Positive Rate

Figure 8-27 Classification Error Rates for Test Trial 3 Using Different Classification Methods Test Trial 4: the Use of Images from New Extrusion Runs Utilizing Injection of Particles with High Additive Polyethylene Pelletized Feed These images (the same set of images as in Run 3-6 in Table 3-2) proved the most

difficult of all. Recall that it was this set of images that later required the Reference

Image Database to be split by a decision rule. The split database was not used in this part

of the work. The classification results for different classification methods are shown in

Table 8-17 through 20 and displayed in Figure 8-28. The result shows that the overall

error rate for CBR Classification is 32.1% but not significantly worse than the 29.7% of

Adaptive IQMod Classification method for this image. It was also observed that other

classification methods did not perform well with overall classification error rates of

36.7% for Torabi Adaptive method and 30.6% for Static IQMod method, respectively.

226

Table 8-17 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Torabi Adaptive Classification

Predicted Class is WP Predicted Class is WO

Actual Class is WP 74 117 Actual Class is WO 9 143

Table 8-18 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using CBR

Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 91 100 Actual Class is WO 2 150

Table 8-19 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using Static

IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 88 103 Actual Class is WO 2 150

Table 8-20 Confusion Matrix for Test Trial 4 Batch-A Image Set (Run 3-6 in Table 3-2) Using

Adaptive IQMod Classification

Predicted Class is WP Predicted Class is WO Actual Class is WP 175 16 Actual Class is WO 86 66

227

29.7%30.6%

36.7%32.1%

56.0% 53.9%61.3%

8.9%

56.6%

1.3% 1.3%

5.9%

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

In-line TorabiAdaptive

CBRClassification

IQMod Adaptive IQMod

Erro

r R

ate

Overall Error Rate False Negative Rate False Positive Rate

Figure 8-28 Classification Results for Test Trial 4 Image Set Using Different Classification Methods

In summary, application of CBR Classification to all four test trials demonstrated that,

although CBR Classification was generally an improvement over the Torabi Adaptive

Classification model, the Adaptive IQMod method developed in this research provided

superior results. The main reason for this is the ability of the Adaptive IQMod method to

better overcome the effect of variable image quality.

228

Appendix VIII Computation Time for Different Classification Methods

One aspect of measuring the efficiency of a classification method is the computation time

that it takes to process and classify an image. Table 8-21 shows the average computation

times required to process and classify one image for the four trial image sets specified in

Section 5.2.3. Adaptive IQMod Classification takes the most time (3.688 seconds per

image) while the Torabi Adaptive Classification required the least (1.796 seconds per

image). Case-Based Reasoning Classification and Static IQMod Classification required

1.894 and 3.631 seconds per image, respectively. This is not surprising as Adaptive

IQMod Classification involves the image quality improvement step which is the most

time-consuming procedure. The in-line image monitoring system used in this work

generates an image every 1 to 3 seconds. Thus, the total time for image generation and

interpretation is less than about 7 seconds in the worst case. The adequacy of this time

requirement depends on the needs of quality control. It was not of concern in this work

because for conventional plastics extrusion processes all of the times obtained were

considered to be more than sufficient. Also, if an exceptional need arises, various

measures could readily be explored to increase speed. For example, for Adaptive IQMod

Classification removing the intermediate image storing step could be the focus.

Table 8-21 Computation Time for Different Classification Methods

Method Average Time (in second)

Torabi Adaptive Classification 1.796 CBR Classification 1.894 Static IQMod Classification 3.631 Adaptive IQMod Classification 3.688

229

Appendix IX Statistics on the Estimation of Proportions There is a need to determine how certain we are about the classification error rates

obtained for a set of images. In this thesis, this is accomplished by computing a

confidence interval. This appendix details how that was done.

For the training images, ten fold cross validation was done each time a model was

formulated. A total of 745 images were used. So, nine folds (i.e., 90% or 671 of 745

images) in the cross validation fitted the model and obtained the error rate when the

remaining fold of 74 images were classified using this model. The process was repeated

ten times using a different 74 image group each time. The error rates obtained for the

different models arising out of using different image quality definitions to optimize the

images to be classified is shown in Table 8-22 .

Table 8-22 Errors from 10-Fold Cross Validation for Classification Models Created Based on Images Optimized Using Different Image Quality Definitions

Origin of Data Raw LS WLS DF PD Fold 1 6 2 1 1 1 Fold 2 4 2 3 2 0 Fold 3 3 1 2 1 0 Fold 4 5 1 1 1 0 Fold 5 7 3 2 2 0 Fold 6 5 2 1 1 0 Fold 7 3 0 0 0 0 Fold 8 2 0 2 1 0 Fold 9 3 1 0 2 0 Fold 10 3 1 2 2 0 Average Number of Errors 4.1 1.3 1.4 1.3 0.1 Standard Deviation 1.60 0.95 0.97 0.67 0.32 Total Number of Errors 41 13 14 13 1

Notes: Raw – Raw Images, LS – Least Squares Optimized Images, WLS—Weighted Least Squares Optimized Images, DF – Desirability Function Optimized Images, PD – Probability Density Difference Optimized Images.

230

By inspection of this table it can be seen that the probability density difference definition

of image quality obviously has the lowest error rate. Quantifying precision beyond the

information shown in Table 8-22 using confidence intervals encounters some uncertainty

because the error rates cannot really be considered as statistically independent quantities

[145]: in the cross validation process, calculation of each error rate uses many of the raw

images that were used for the other error rates. Also, when only ten error rates are used

in a confidence interval calculation, the conventional approach is to use the Students t

distribution. That calculation requires that the error rates be normally distributed. When

as many as 30 error rates are used then the Central Limit Theorem is applicable and often

the distribution of sample mean error rates can be considered normally distributed: then

the standard normal variate, z, can be used instead of t.

Confidence intervals (or, hypothesis tests which involve essentially the same

calculations) are conventionally used in data mining research and are calculated not from

the error rates but from a theoretical equation [110, 146]. That equation assumes that the

error rates obey a binomial distribution which can be approximated by a Normal

distribution. The approach is very convenient for test images since cross validation is

only carried out in the training step and so, error rates from cross validation are not

available. Closer examination of this method revealed that the most common method of

calculating the theoretical intervals is based upon yet another approximation [146].

When that approximation is used, the confidence interval is symmetric about the average

error rate. When it is not used then the interval is asymmetric. Despite the popularity of

the symmetric interval equation, in a comparison of these two theoretical equations [146],

the asymmetric interval was found to provide a very significantly superior estimate of the

231

confidence interval. Since the asymmetric theoretical equation was used in this work for

obtaining all confidence intervals shown in the bar graphs throughout the thesis, further

details, including a sample calculation, will be shown below. However before

progressing to that, Figure 8-29 shows a comparison of the various confidence intervals

including those calculated using the cross validation error rates of Table 8-22 for the

various classification models resulting from different image quality definitions. It can be

seen in Figure 8-23 that the asymmetric, theoretical confidence interval consistently

provided the most pessimistic view of the data.

Figure 8-29 Comparison of Confidence Intervals

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

TorabiBayesian

Model

LeastSquares

WeightedLeast

Squares

DesirabilityFunction

ProbabilityDensity

Difference

Erro

r Rat

e (%

)

Asymmetric Approximated CI Symmetric Approximated CIT-Distribution CI Z-Distribution CI

Now, the calculation of the symmetric and asymmetric theoretical confidence intervals

will be described. The data of Table 8-23 will be used to illustrate the calculation. This

data shows typical results of the classification of a set of images.

232

Table 8-23 A Sample Classification Confusion Matrix

Predicted Class is WP Predicted Class is WO

Actual Class is WP 489 16 Actual Class is WO 25 215

From Table 8-23 , the total number of images in the sample (i.e. the sample size, n) is:

n = 489+16+25+215 = 745

The classification error rate, e, is given by: e= y/n= (16+25)/745 = 0.055

where y is the total number of misclassified images.

It is known that the proportion, e, follows binomial distributions. However, if ne > 5

when e ≤ 0.5 or if n (1-e) > 5 when e ≥ 0.5, then the binomial distribution can be

approximated by a normal distribution with a mean of ne and a standard deviation of (ne

(1-e))0.5. That is N(ne,(ne(1-e))0.5) ~ N(µ, σ). With this assumption, the calculation of the

standard normal variate, z, for an observation of y misclassified images is given by

)1( eneneyz−

−= 8-67

where e is the true classification error rate, y is the number of misclassified images and n

is the sample size. z then follows a standard normal distribution (a normal distribution

with a mean of zero and standard deviation of 1), that is, it is N(0,1).

The above equation can be rewritten as:

nee

enyz/)1(

/−

−= 8-68

233

Solving the above equation for e gives “asymmetric confidence interval” equation (termed the “score confidence interval” [146]):

nz

nz

ny

nyz

nz

ny

e 2

2

2

3

2

2

2

1

42

+

+−±+= 8-69

The value of z for the 95% confidence level is obtained using a z table; it is 1.96. The

lower limit of the confidence interval is calculated as substituting the appropriate values

of the variables into the equation:

0408.0

74596.11

745496.1

74541

7454196.1

745296.1

74541

2

2

2

3

2

2

2

=+

×+−−

×+

The upper limit is:

0738.0

74596.11

745496.1

74541

7454196.1

745296.1

74541

2

2

2

3

2

2

2

=+

×+−+

×+

The estimate of the 95% confidence interval is therefore:

0.0408 ≤ e ≤ 0.0738. We are 95% confident that the “actual” error rate is in that range.

More accurately, the meaning of this 95% confidence interval is as follows:

If 1000 samples of 745 images each were obtained and the 95% confidence interval

calculated as above for each one of these samples, about 950 of these confidence

intervals would contain the population mean of the error rate (the “actual” error rate) and

about 50 would not.

234

An approximate form of Eqn. 8-68 frequently used in data mining applications can be

obtained from Eqn. 8-69 by first writing this equation as:

neez

nye )1(

2/−

±= α 8-70

and then approximating the true error rate (the population mean error rate) on the right

hand side of the equation by the observed error rate, y/n, to obtain the symmetric

confidence interval equation (termed the Wald confidence interval [146]):

nny

ny

znye

)1(2/

−±= α 8-71

Now, repeating the above numerical example using Eqn. 8-70 we obtain:

745

)745411(

74541

96.174541 −

±=e

The 95% confidence limits are: 0.00387, 0.0714. For this specific example then, the

values obtained represent about a 5% and 3% deviation respectively from the values

obtained from the first, more exact, equation. In this thesis all results were calculated

using Equation 8-69.

235

9 REFERENCES 1. Torabi, K., S. Sayad, and S.T. Balke, On-line Adaptive Bayesian Classification

for In-line Particle Image Monitoring in Polymer Film Manufacturing. Computers and Chemical Engineering, 2005.

2. Torabi, K., Data Mining Methods for Quantitative In-line Image Monitoring in Polymer Extrusion, Ph.D. Thesis, Department of Chemical Engineering and Applied Chemistry. University of Toronto: Toronto, ON, Canada. 2005

3. Baykut, A., et al., Real-time Defect Inspection of Textured Surfaces. Real-Time Imaging, 2000. 6(1): p. 17-27.

4. Bharati, M.H. and J.F. MacGregor, Multivariate image analysis for real-time process monitoring and control. Industrial & Engineering Chemistry Research, 1998. 37(12): p. 4715-4724.

5. Bharati, M.H., J.F. MacGregor, and W. Tropper, Softwood lumber grading through on-line multivariate image analysis techniques. Industrial & Engineering Chemistry Research, 2003. 42(21): p. 5345-5353.

6. Darwish, A.M. and A.K. Jain, A rule based approach for visual pattern inspection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1988. 10(1): p. 56-68.

7. Farahani, F., et al. In-line color monitoring during polyethylene extrusion: Reflectance spectra and images. in 61st Annual Technical Conference ANTEC 2003. 2003. Nashville, TN, United States.

8. Gilmor, C., et al., In-Line Color Monitoring of Polymers During Extrusion Using a Charge Coupled Device Spectrometer: Color Changeovers and Residence Time Distributions. Polymer Engineering and Science, 2003. 43(2): p. 356-368.

9. Joeris, K., et al., In-situ microscopy: Online process monitoring of mammalian cell cultures. Cytotechnology, 2002. 38(1-2): p. 129-134.

10. Persoon, E., A pipelined image analysis system using custom integrated circuits. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1988. 10(1): p. 110-116.

11. Reshadat, R., et al. In-line color monitoring of pigmented polyolefins during extrusion I: assessment. in Proceedings of the 1996 54th Annual Technical Conference. 1996. Part 3 (of 3). Indianapolis, IN.

12. Sayad, S., et al. In-line color monitoring of pigmented polyolefins during extrusion II: color prediction. in Proceedings of the 1996 54th Annual Technical Conference. 1996. Part 3 (of 3). Indianapolis, IN.

13. Torabi, K., et al. Data mining for image analysis: in-line particle monitoring in polymer extrusion. in Third International Conference on Data Mining. Data Mining III. 2002. Bologna, Italy: WIT Press.

14. Wang, H. and R. Kovacevic, On-line monitoring of the keyhole welding pool in variable polarity plasma arc welding. Proceedings of the Institution of Mechanical Engineers Part B-Journal of Engineering Manufacture, 2002. 216(9): p. 1265-1276.

236

15. Watano, S., et al., On-line monitoring of granule growth in high shear granulation by an image processing system. Chemical & Pharmaceutical Bulletin, 2000. 48(8): p. 1154-1159.

16. Yoda, H., et al., An automatic wafer inspection system using pipelined image processing techniques. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1988. 10(1): p. 4-16.

17. Yu, K., J.A. Phillips, and A.J. Rein, Online Monitoring and Control of Escherichia-Coli Fermentation with Ftir/Atr Spectroscopy Implemented on a Pilot-Scale Fermenter. Abstracts of Papers of the American Chemical Society, 1994. 207: p. 100-BIOT.

18. Yu, H.L., et al., Digital imaging for online monitoring and control of industrial snack food processes. Industrial & Engineering Chemistry Research, 2003. 42(13): p. 3036-3044.

19. Ing, L. and S.T. Balke. In-line Measurement of Dispersed Phase Properties Using The Scanning Particle Monitor. in ANTEC 2002. 2002. San Francisco.

20. Sayad, S. and S.T. Balke. An intelligent learning machine. in Fourth International Conference on Data Mining, Data Mining IV. 2003. Rio De Janeiro, Brazil: Wessex Institute of Technology; COPPE/Federal University of Rio de Janeiro.

21. Dieterle, F., S. Busche, and G. Gauglitz, Growing neural networks for a multivariate calibration and variable selection of time-resolved measurements. Analytica Chimica Acta, 2003. 490(1-2): p. 71-83.

22. An, J.-L., Z.-O. Wang, and Z.-P. Ma. An incremental learning algorithm for support vector machine. in Machine Learning and Cybernetics, 2003 International Conference on. 2003.

23. Mouchaweh, M.S., et al., Incremental learning in Fuzzy Pattern Matching. Fuzzy Sets and Systems, 2002. 132(1): p. 49-62.

24. Bruzzone, L. and D. Fernàndez Prieto, An incremental-learning neural network for the classification of remote-sensing images. Pattern Recognition Letters, 1999. 20(11-13): p. 1241-1248.

25. Freund, Y. and R.E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer and System Sciences, 1997. 55(1): p. 119-139.

26. Wienke, D., et al., Adaptive resonance theory based neural network for supervised chemical pattern recognition (FuzzyARTMAP) .2. Classification of post-consumer plastics by remote NIR spectroscopy using an InGaAs diode array. Chemometrics and Intelligent Laboratory Systems, 1996. 32(2): p. 165-176.

27. Wienke, D. and L. Buydens, Adaptive resonance theory based neural network for supervised chemical pattern recognition (FuzzyARTMAP) .1. Theory and network properties. Chemometrics and Intelligent Laboratory Systems, 1996. 32(2): p. 151-164.

28. Fu, L., H.-H. Hsu, and J.C. Principe, Incremental backpropagation learning networks. Neural Networks, IEEE Transactions on, 1996. 7(3): p. 757-761.

29. Wienke, D. and D.L. Buydens, Adaptive resonance theory based neural networks -- the ‘ART’ of real-time pattern recognition in chemical process monitoring? TrAC - Trends in Analytical Chemistry, 1995. 14(1): p. 398-406.

237

30. Carpenter, G.A., et al., Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. Neural Networks, IEEE Transactions on, 1992. 3(5): p. 698-713.

31. Park, D.C., M.A. El-Sharkawi, and R.J. Marks, II, An adaptively trained neural network. Neural Networks, IEEE Transactions on, 1991. 2(3): p. 334-345.

32. Carpenter, G.A., S. Grossberg, and J.H. Reynolds, ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Networks, 1991. 4(5): p. 565-588.

33. Torabi, K., S. Sayad, and S.T. Balke, On-line adaptive Bayesian classification for in-line particle image monitoring in polymer film manufacturing. Computers & Chemical Engineering, 2005. 30(1): p. 18-27.

34. Avcibas, I., B. Sankur, and K. Sayood, Statistical evaluation of image quality measures. J. Electron. Imaging, 2002. 11(2): p. 206-223.

35. Eskicioglu Ahmet, M. Quality measurement for monochrome compressed images in the past 25 years. in 2000 IEEE Interntional Conference on Acoustics, Speech, and Signal Processing. 2000. Istanbul, Turkey: Ieee.

36. Fiete, R.D. and T. Tantalo, Comparison of SNR image quality metrics for remote sensing systems. 2001. 40(4): p. 574-585.

37. Jacobson, R.E., An Evaluation of Image Quality Metrics. J. Photogr. Sci., 1995. 43(1): p. 7-16.

38. Shnayderman, A., A. Gusev, and M. Eskicioglu Ahmet. A multidimensional image quality measure using Singluar Value Decomposition. in Image Quality and System Performance. 2004. San Jose, California, USA: SPIE and IS&T.

39. Veldkamp, W.J.H. and N. Karssemeijer, Normalization of local contrast in mammograms. Medical Imaging, IEEE Transactions on, 2000. 19(7): p. 731-738.

40. Wang, Z. and A.C. Bovik, A universal image quality index. IEEE Signal Processing Letters, 2002. 9(3): p. 81.

41. Wang, Z., L. Lu, and A.C. Bovik, Video quality assessment based on structural distortion measurement. Signal Processing: Image Communication, 2004. 19(2): p. 121-132.

42. Wang, Z., H.R. Sheikh, and A.C. Bovik. No-reference perceptual quality assessment of JPEG compressed images. in Proceedings of ICIP 2002 International Conference on Image Processing. 2002. Piscataway, NJ, USA: IEEE.

43. Xu, W. and G. Hauske, Picture quality evaluation based on error segmentation. Proceedings of SPIE The International Society for Optical Engineering, 1994. 2308: p. 1454-1465.

44. Eskicioglu Ahmet, M. and S. Fisher Paul, Image quality measures and their performance. IEEE Transactions on Communications, 1995. 43(12): p. 2959-2965.

45. Aach, T., U. Schiebel, and G. Spekowius, Digital image acquisition and processing in medical x-ray imaging. Journal of Electronic Imaging, 1999. 8(1): p. 7-22.

46. Li, X. Blind image quality assessment. in Image Processing. 2002. Proceedings. 2002 International Conference on. 2002.

238

47. Caviedes, J. and S. Gurbuz. No-reference sharpness metric based on local edge kurtosis. in Proceedings of ICIP 2002 International Conference on Image Processing. 2002. Piscataway, NJ, USA: IEEE.

48. Turaga, D.S., Y. Chen, and J. Caviedes, No reference PSNR estimation for compressed pictures. Signal Processing: Image Communication, 2004. 19(2): p. 173-184.

49. Bojkovic, Z., Image quality estimation in subband coding techniques based on human visual system. International Conference on Communication Technology Proceedings, ICCT, 1996. 2: p. 651-653.

50. Caviedes, J. and F. Oberti, A new sharpness metric based on local kurtosis, edge and energy information. Signal Processing-Image Communication, 2004. 19(2): p. 147-161.

51. Siew, L.H., R.M. Hodgson, and E.J. Wood, Texture measures for carpet wear assessment. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1988. 10(1): p. 92-105.

52. Gonzalez, R.C. and R.E. Woods, Digital Image Processing. 2002, Upper Saddle River, N.J.: Prentice Hall.

53. Blatt, R.J., et al., Automated quantitative analysis of angiogenesis in the rat aorta model using image-pro plus 4.1. Computer Methods and Programs in Biomedicine, 2004. 75(1): p. 75-79.

54. Belien, J.A.M., et al., Fully automated microvessel counting and hot spot selection by image processing of whole tumour sections in invasive breast cancer. Journal of Clinical Pathology, 1999. 52(3): p. 184-192.

55. Beksac, M.S., et al., An automated intelligent diagnostic system for the interpretation of umbilical artery Doppler velocimetry. European Journal of Radiology, 1996. 23(2): p. 162-167.

56. Zalewski, K. and R. Buchholz, Morphological analysis of yeast cells using an automated image processing system. Journal of Biotechnology, 1996. 48(1-2): p. 43-49.

57. Yeasin, M. and S. Chaudhuri, Development of an automated image processing system for kinematic analysis of human gait. Real-Time Imaging, 2000. 6(1): p. 55-67.

58. Wit, P. and H.J. Busscher, Application of an artificial neural network in the enumeration of yeasts and bacteria adhering to solid substrata. Journal of Microbiological Methods, 1998. 32(3): p. 281-290.

59. Tanaka, M. and A. Kayama, Automated image processing for fractal analysis of fracture surface profiles in high-temperature materials. Journal of Materials Science Letters, 2001. 20(10): p. 907-909.

60. Petropoulos, H., W.L. Sibbitt, and W.M. Brooks, Automated T-2 quantitation in neuropsychiatric lupus erythematosus: A marker of active disease. Journal of Magnetic Resonance Imaging, 1999. 9(1): p. 39-43.

61. Mahadevan, S. and D. Casasent, Automated image processing for grain boundary analysis. Ultramicroscopy, 2003. 96(2): p. 153-162.

62. Kuklin, A., S. Shams, and S. Shah, High throughput screening of gene expression signatures. Genetica, 2000. 108(1): p. 41-46.

239

63. Krooshoop, D., et al., An automated multi well cell track system to study leukocyte migration. Journal of Immunological Methods, 2003. 280(1-2): p. 89-102.

64. Gong, L., et al. Knowledge-based remote image processing and compression for efficient transmission. in Proceedings of the 1995 IS and T's 48th Annual Conference. 1995. Washington, DC: Is&t.

65. Chien, S.A. and H.B. Mortensen, Automating image processing for scientific data analysis of a large image database. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 1996. 18(8): p. 854-859.

66. Clement, V. and M. Thonnat, A Knowledge-Based Approach to Integration of Image Processing Procedures. CVGIP: Image Understanding, 1993. 57(2): p. 166 - 184.

67. Clouard, R., et al., Borg: a knowledge-based system for automatic generation of image processing programs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999. 21(2): p. 128-144.

68. Gong, L. and A. Kulikowski Casimir, Composition of image analysis processes through object-centered hierarchical planning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995. 17(10): p. 997-1009.

69. Grimm, F. and H. Bunke, Expert system for the selection and application of image processing subroutines. Expert Systems, 1993. 10(2): p. 61-74.

70. Matsuyama, T. Expert systems for image processing - knowledge-based composition of image analysis processes. in 9th International Conference on Pattern Recognition. 1988. Rome, Italy: Int Assoc for Pattern Recognition, Paris, Fr.

71. Toriu, T., H. Iwase, and M. Yoshida, Expert System for Image Processing. Fujitsu Sci Tech J, 1987. 23(2): p. 111-118.

72. Tanaka, T. and N. Sueda. Knowledge acquisition in image processing expert system 'EXPLAIN'. in Proceedings of the International Workshop on Artificial Intelligence for Industrial Applications. 1988. Hitachi, Jpn: IEEE, Industrial Electronics Soc, New York, NY, USA.

73. Aha, D.W., C. Marling, and I. Watson, Case-based reasoning commentaries: introduction. Knowledge Engineering Review, 2005. 20(3): p. 201-202.

74. Holt, A., et al., Medical applications in case-based reasoning. Knowledge Engineering Review, 2005. 20(3): p. 289-292.

75. Rissland, E.L., K.D. Ashley, and L.K. Branting, Case-based reasoning and law. Knowledge Engineering Review, 2005. 20(3): p. 293-298.

76. Kolodner, J.L., M.T. Cox, and P.A. Gonzalez-Caler, Case-based reasoning-inspired approaches to education. Knowledge Engineering Review, 2005. 20(3): p. 299-303.

77. Althoff, K.D. and R.O. Weber, Knowledge management in case-based reasoning. Knowledge Engineering Review, 2005. 20(3): p. 305-310.

78. Ficet-Cauchard, V., C. Porquet, and M. Revenu, CBR for the management and reuse of image-processing expertise: a conversational system. Engineering Applications of Artificial Intelligence, 1999. 12(6): p. 733-747.

79. Perner, P., A. Holt, and M. Richter, Image processing in case-based reasoning. Knowledge Engineering Review, 2005. 20(3): p. 311-314.

240

80. Perner, P., An architecture for a CBR image segmentation system, in Case-Based Reasoning Research and Development. 1999. p. 525-534.

81. Ficet-Cauchard, V., C. Porquet, and M. Revenu, An interactive case-based reasoning system for the development of image processing applications, in Advances in Case-Based Reasoning. 1998. p. 437-447.

82. Grimnes, M. and A. Aamodt, A two layer case-based reasoning architecture for medical image understanding, in Advances in Case-Based Reasoning. 1996. p. 164-178.

83. Jarmulak, J., E.J.H. Kerckhoffs, and P.P. van't Veen, Case-based reasoning in an ultrasonic rail-inspection system, in Case-Based Reasoning Research and Development. 1997. p. 43-52.

84. Perner, P., CBR-based ultra sonic image interpretation, in Advances in Case-Based Reasoning, Proceedings. 2001. p. 479-490.

85. Macura, R.T., et al., Computerized Case-Based Instructional-System for Computed-Tomography and Magnetic-Resonance-Imaging of Brain-Tumors. Investigative Radiology, 1994. 29(4): p. 497-506.

86. Cheetham, W. and I. Watson, Fielded applications of case-based reasoning. Knowledge Engineering Review, 2005. 20(3): p. 321-323.

87. Hinkle, D. and C. Toomey, Applying Case-Based Reasoning to Manufacturing. Ai Magazine, 1995. 16(1): p. 65-73.

88. Heider, R., Troubleshooting CFM56-3 engines for the Boeing737 using a CBR and data-mining, in Advances in Case-Based Reasoning. 1996. p. 512-518.

89. Cheetham, W. and J. Graf, Case-based reasoning in color matching, in Case-Based Reasoning Research and Development. 1997. p. 1-12.

90. Watson, I. and D. Gardingen, A Distributed Case-Based Reasoning Application for Engineering Sales Support in Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence 1999 Morgan Kaufmann Publishers Inc. p. 600-605

91. Varma, A. and N. Roddy, ICARUS: design and deployment of a case-based reasoning system for locomotive diagnostics. Engineering Applications of Artificial Intelligence, 1999. 12(6): p. 681-690.

92. Jurisica, I. and J. Glasgow, Applications of case-based reasoning in molecular biology. Ai Magazine, 2004. 25(1): p. 85-95.

93. Perner, P., Why case-based reasoning is attractive for image interpretation, in Case-Based Reasoning Research and Development, Proceedings. 2001. p. 27-43.

94. Jabbour, K., et al., Alfa - Automated Load Forecasting Assistant. Ieee Transactions on Power Systems, 1988. 3(3): p. 908-914.

95. Bartels, P.H., T. Gahm, and D. Thompson, Automated microscopy in diagnostic histopathology: From image processing to automated reasoning. International Journal of Imaging Systems and Technology, 1997. 8(2): p. 214-223.

96. Haddad, M., K.P. Adlassnig, and G. Porenta, Feasibility analysis of a case-based reasoning system for automated detection of coronary heart disease from myocardial scintigrams. Artificial Intelligence in Medicine, 1997. 9(1): p. 61-78.

97. Jarmulak, J., Case-based classification of ultrasonic B-scans: Case-base organisation and case retrieval, in Advances in Case-Based Reasoning. 1998. p. 100-111.

241

98. Leake, D.B., Case-Based Reasoning: Experiences, Lessons, and Future Directions. 1996, Menlo Park: AAAI Press/MIT Press.

99. Aamodt, A. and E. Plaza, Case-Based Reasoning - Foundational Issues, Methodological Variations, and System Approaches. Ai Communications, 1994. 7(1): p. 39-59.

100. De Mantaras, R.L., et al., Retrieval, reuse, revision and retention in case-based reasoning. Knowledge Engineering Review, 2005. 20(3): p. 215-240.

101. Perner, P., Are case-based reasoning and dissimilarity-based classification two sides of the same coin? Engineering Applications of Artificial Intelligence, 2002. 15(2): p. 193-203.

102. Zamperoni, P. and V. Starovoitov. How dissimilar are two gray-scale images. in DAGM. 1995. Berlin, Germany: Springer.

103. Santini, S. and R. Jain, Similarity measures. Ieee Transactions on Pattern Analysis and Machine Intelligence, 1999. 21(9): p. 871-883.

104. Perner, P., Content-Based Image Indexing and Retrieval in an Image Database for Technical Domains, in Lecture Notes in Computer Science, H.S. Smuelder, Editor. 1998, Springer: Berlin. p. 207-224.

105. Tsai, C.-Y., C.-C. Chiu, and J.-S. Chen, A case-based reasoning system for PCB defect prediction. Expert Systems With Applications, 2005. 28(4): p. 813-822.

106. Perner, P., Using CBR Learning for the Low-Level and Highlevel Unit of an Image Interpretation System, in Advances in Pattern Recognition, S. Singh, Editor. 1998, Springer: Berlin. p. 45-54.

107. McSherry, D., Precision and recall in interactive case-based reasoning, in Case-Based Reasoning Research and Development, Proceedings. 2001. p. 392-406.

108. Bailey, T.L. and C. Elkan. Estimating the accuracy of learned concepts. in International Joint Conference on Artificial Intelligence. 1993: Morgan Kaufmann Publisher.

109. Breiman, L. and P. Spector, Submodel Selection and Evaluation in Regression - the X-Random Case. International Statistical Review, 1992. 60(3): p. 291-319.

110. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. in Proceedings of International Joint Conference on Artificial Intelligence. 1995. Montreal, Que., Canada.

111. Domingos, P. and M. Pazzani, On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 1997. 29(2-3): p. 103-130.

112. Rish, I. An empirical study of the naive Bayes classifier. in International Joint Conference on Artificial Intelligence. 2001.

113. Peli, E., In search of a contrast metric: Matching the perceived contrast of Gabor patches at different phases and bandwidths. Vision Research, 1997. 37(23): p. 3217-3224.

114. McLoughlin, K.J., P.J. Bones, and N. Karssemeijer, Noise equalization for detection of microcalcification clusters in direct digital mammogram images. Medical Imaging, IEEE Transactions on, 2004. 23(3): p. 313-320.

115. Zhang, D. and Z. Wang, Impulse noise detection and removal using fuzzy techniques. Electronics Letters, 1997. 33(5): p. 378-379.

116. Chen, T.-J., et al., A novel image quality index using Moran I statistics. Physics in Medicine and Biology, 2003. 48(8): p. N131.

242

117. Zhang, N.F., et al., Image sharpness measurement in the scanning electron microscope - Part III. Scanning, 1999. 21(4): p. 246-252.

118. Marziliano, P., et al. A no-reference perceptual blur metric. in Proceedings of ICIP 2002 International Conference on Image Processing. 2002. Piscataway, NJ, USA: IEEE.

119. Puttenstein, J.G., I. Heynderickx, and G. de Haan, Evaluation of objective quality measures for noise reduction in TV-systems. Signal Processing: Image Communication, 2004. 19(2): p. 109-119.

120. Venetsanopoulos, A., Digital Image Processing and Applications. 2002: Toronto. 121. Seul, M., L. O'Gorman, and M.J. Sammon, Practical Algorithms for image

Analysis. 2000, Cambridge, UK: Cambridge University Press. 122. Davies, E., Machine Vision: theory, Algorithms, practicalities. 1997, Academic

Press: San Diego. 123. Olsson, D.M. and L.S. Nelson, Nelder-Mead Simplex Procedure for Function

Minimization. Technometrics, 1975. 17(1): p. 45-51. 124. Walters, F., Sequential simplex optimization - An update. Analytical Letters,

1999. 32(2): p. 193-+. 125. Walters, F.H., et al., Sequential Simplex Optimisation. 1991, Boca Raton, Florida:

CRC Press LLC. 126. Routh, M.W., P.A. Swartz, and M.B. Denton, Performance of the super modified

simplex. Analytical Chemistry, 1977. 49(9): p. 1422-1428. 127. Betteridge, D., A.P. Wade, and A.G. Howard, Reflections on the modified

simplex--I. Talanta, 1985. 32(8, Part 2): p. 709-722. 128. Betteridge, D., A.P. Wade, and A.G. Howard, Reflections on the modified

simplex--II. Talanta, 1985. 32(8, Part 2): p. 723-734. 129. Kelley, C.T., Detection and remediation of stagnation in the Nelder-Mead

algorithm using a sufficient decrease condition. Siam Journal on Optimization, 1999. 10(1): p. 43-55.

130. Barton, R.R. and J.S. Ivey, Nelder-Mead simplex modifications for simulation optimization. Management Science, 1996. 42(7): p. 954-973.

131. Syed, N., H. Liu, and K. Sung. Incremental Learning with Support Vector Machines. in International Joint Conference on Artificial Intelligence. 1999. Stockholm, Sweden.

132. Carozza, M. and S. Rampone. Towards an incremental SVM for regression. in Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on. 2000.

133. Xiao, R., J. Wang, and F. Zhang. An approach to incremental SVM learning algorithm. in Tools with Artificial Intelligence, 2000. ICTAI 2000. Proceedings. 12th IEEE International Conference on. 2000.

134. Cauwenberghs, G. and T. Poggio, Incremental and Decremental Support Vector Machine Learning, in Advances in Neural Information Processing Systems. 2001, MIT Press.

135. Li, K. and H.-K. Huang. Incremental learning proximal support vector machine classifiers. in Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on. 2002.

136. Martin, M., On-line Support Vector Machines for Function Approximation. 2002.

243

137. Peng, B., X. Sun Zheng, and G. Xu Xiao. SVM-based incremental active learning for user adaptation for online graphics recognition system. in Proceedings of 2002 International Conference on Machine Learning and Cybernetics. 2002. Beijing, China: Hebei University; IEEE systems, Man and Cybernetics technical Comm. on Cybernetics.

138. Diehl Christopher, P. and G. Cauwenberghs. SVM Incremental Learning, Adaptation and Optimization. in International Joint Conference on Neural Networks 2003. 2003. Portland, OR, United States: The International Neural Network Society; The IEEE Neural Network Society.

139. Chakraborty, D. and N.R. Pal, A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning. Ieee Transactions on Neural Networks, 2003. 14(1): p. 1-14.

140. Sato, M. and S. Ishii, On-line EM algorithm for the normalized gaussian network. Neural Computation, 2000. 12(2): p. 407-432.

141. Schaal, S. and C.G. Atkeson, Constructive incremental learning from only local information. Neural Computation, 1998. 10(8): p. 2047-2084.

142. Schaal, S., C.G. Atkeson, and S.V. Vijayakumar, Scalable techniques from nonparametric statistics for real time robot learning. Applied Intelligence, 2002. 17(1): p. 49-60.

143. Su, J.B., J. Wang, and Y.G. Xi, Incremental learning with balanced update on receptive fields for multi-sensor data fusion. Ieee Transactions on Systems Man and Cybernetics Part B-Cybernetics, 2004. 34(1): p. 659-665.

144. Sugiyama, M. and H. Ogawa, Incremental construction of projection generalizing neural networks. Ieice Transactions on Information and Systems, 2002. E85D(9): p. 1433-1442.

145. Bengio, Y. and Y. Grandvalet, No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 2004. 5: p. 1089-1105.

146. Agresti, A. and B.A. Coull, Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician, 1998. 52(2): p. 119.