37
CSC446 : Pattern Recognition Prof. Dr. Mostafa Gadal-Haqq Faculty of Computer & Information Sciences Computer Science Department AIN SHAMS UNIVERSITY Lecture Note 2: Chapter 1: Introduction to PRS ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 1 Chapter 1: DHS, Pattern Classification

Csc446: Pattren Recognition (LN2)

Embed Size (px)

Citation preview

Page 1: Csc446: Pattren Recognition (LN2)

CSC446 : Pattern Recognition

Prof. Dr. Mostafa Gadal-Haqq

Faculty of Computer & Information Sciences

Computer Science Department

AIN SHAMS UNIVERSITY

Lecture Note 2:

Chapter 1: Introduction to PRS

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 1

Chapter 1: DHS, Pattern Classification

Page 2: Csc446: Pattren Recognition (LN2)

An intuitive Example:

A Fish Sorting Machine

How to Build a PR System?

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 2

Page 3: Csc446: Pattren Recognition (LN2)

• To illustrate the complexity of

some of the types of problems

involved, let us consider the

following imaginary example:

“Sort incoming fish on a

conveyor belt according to

species (e.g. Sea bass and

Salmon) using optical sensing

(images).”

PR Example: A Fish Sorting Machine

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 3

Page 4: Csc446: Pattren Recognition (LN2)

Pattern Recognition System

Stages in Pattern Recognition Systems

sensing Preprocessing &

Feature extraction Classification

objects

Decision (Salmon/

Sea bass)

raw data (images)

pattern

features

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 4

Page 5: Csc446: Pattren Recognition (LN2)

• Sensing

– Use Physical sensing devices ( e.g. Camera, …) to get

data from the object.

• Preprocessing

– Use a segmentation operation to isolate fishes from one

another and from the background

• Feature Extraction:

– reduce the data by extracting certain features that are

intrinsic to certain type of fish.

• Classification:

– The features are passed to a classifier for categorization.

Stages in Pattern Recognition Systems

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 5

Page 6: Csc446: Pattren Recognition (LN2)

• Sensing:

– Use intensity camera to acquire fish images.

• Preprocessing:

– Use image segmentation techniques to segment

fishes from each other and from the

background.

The Fish Sorting Machine Example

J. Malcolm et al. A Graph Cut Approach to Image Segmentation in Tensor Space. Workshop on Component Analysis

Methods (in CVPR), 2007.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 6

Page 7: Csc446: Pattren Recognition (LN2)

• Features to extract from sample images :

• Length

• Lightness

• Width

• Number and shape of fins

• Position of the mouth, etc…

The Fish Sorting Machine Example

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 7

Page 8: Csc446: Pattren Recognition (LN2)

•Feature Selection: length

– Using training (design or learning) samples we compute

the histogram (class conditional probability) for each class.

Feature Selection

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 8

Page 9: Csc446: Pattren Recognition (LN2)

Problem

• The length is a poor feature!

– that is we can not make accurate decision using

the length alone.

Solution:

• Select another possible feature;

– e.g. the lightness, to enhance the classification

process.

Feature Selection

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 9

Page 10: Csc446: Pattren Recognition (LN2)

Decision Boundary in Feature Space

• Fish distribution according to their lightning.

Overlap in the histograms is small compared to length feature

Decision boundary

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 10

Page 11: Csc446: Pattren Recognition (LN2)

The decision theory task:

How to decide the threshold of the decision

boundary and cost relationship?

– How we minimize the cost (error )?

• That is, reduce the number of sea bass that are

classified salmon, or the converse.

• This is the central task of decision theory.

Decision Theory

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 11

Page 12: Csc446: Pattren Recognition (LN2)

Feature Selection

• To increase the accuracy of the PR system, we use

the most discriminative features. This is the task of

the feature extractor:

– Suppose that, the lightness and the width were selected as

discriminative features, then we use their values to form

the feature vector X.

Feature

Extractor

preprocessed

Fish image

Lightness

width

2

x

x

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 12

Page 13: Csc446: Pattren Recognition (LN2)

Decision Boundary in Feature Space

Decision boundary

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 13

Page 14: Csc446: Pattren Recognition (LN2)

• We might add other features that are not

correlated with the ones we already have.

– A precaution should be taken not to reduce the

performance by adding such “noisy features”

• Ideally, the best decision boundary should be

the one which provides an optimal

performance such as in the following figure:

Decision Boundary in Feature Space

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 14

Page 15: Csc446: Pattren Recognition (LN2)

Decision Boundary in Feature Space

Decision boundary

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 15

Page 16: Csc446: Pattren Recognition (LN2)

• Previous decision boundary produce zero

training error. However, our satisfaction is

premature because the central aim of

designing a classifier is to correctly classify

novel inputs.

• Model generalization is to have an optimal

decision boundary. This is the task of

statistical pattern recognition.

Model Generalization

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 16

Page 17: Csc446: Pattren Recognition (LN2)

Optimal Decision Boundary

Decision boundary

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 17

Page 18: Csc446: Pattren Recognition (LN2)

Another Example: Sorting Fruits

“Castleman, Digital Image Processing, Prentice-Hall, 1979”

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 18

Page 19: Csc446: Pattren Recognition (LN2)

Another Example: Sorting Fruits

Complexity of feature space

Apple

Lemon

Orange

Cherry

Rednes

Diameter

Decision

boundary

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 19

Page 20: Csc446: Pattren Recognition (LN2)

Pattern Recognition Systems

Design

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 20

Page 21: Csc446: Pattren Recognition (LN2)

Pattern Recognition Systems Design

• Designing a pattern recognition system involves

two stages:

– Training: using training data to train the system to

learn decision boundaries

– Testing:using testing data to test the accuracy of the

system to recognize new data.

• Challenges:

– Representation: Selection of discriminative features

– Matching: Selecting a good classification model.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 21

Page 22: Csc446: Pattren Recognition (LN2)

Difficulties of Representation

• “A program that could distinguish

between male and female faces in a random

snapshot would probably earn its author a

Ph.D. in computer science.” (A. Penzias

1989)

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 22

Page 23: Csc446: Pattren Recognition (LN2)

Good Representation

• Should have some invariant properties (e.g., w.r.t.

rotation, translation, scale…)

• Account for intra-class variations

• Ability to discriminate pattern classes of interest

• Robustness to noise/occlusion

• Lead to simple decision making (e.g., linear

decision boundary)

• Low cost (affordable)

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 23

Page 24: Csc446: Pattren Recognition (LN2)

Good Representation

• Good representation leads to small intra-

class variation, large inter-class separation

& simple decision rule.

• A representation could consist of a vector

of real-valued numbers, ordered list of

attributes, parts and their relations….

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 24

Page 25: Csc446: Pattren Recognition (LN2)

Feature Selection/Extraction

• Each pattern is represented as a point in the d-

dimensional feature space.

• Features and their desired invariance properties

are domain-specific.

• How many features and which ones to use in

constructing the decision boundary?

• Some features may be redundant!

• Curse of dimensionality problems with too many

features especially when we have a small number

of training samples

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 25

Page 26: Csc446: Pattren Recognition (LN2)

• Template Matching

– Assumes very small intra-class variability

– Learning is difficult for deformable templates

• Syntactic

– Primitive extraction is sensitive to noise

– Describing a pattern in terms of primitives is difficult

• Statistical

– Assumption of density model for each class

• Neural Network

– Parameter tuning and local minima in learning

• In practice, statistical and neural network approaches work well.

Decision Models

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 26

Page 27: Csc446: Pattren Recognition (LN2)

PRS Design Cycle

Collect Data

Choose Feature

Choose Model

Train Classifier

Evaluate Classifier

Start

Prior knowledge

(e.g. invariances)

End

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 27

Page 28: Csc446: Pattren Recognition (LN2)

• Data Collection

– Collect an adequately large and representative

set of examples and divide them into (70%)

training and (30%) testing the system.

• Feature Choice

– Depends on the characteristics of the problem

domain. Simple to extract, invariant to

irrelevant transformation insensitive to noise.

PRS Design Cycle

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 28

Page 29: Csc446: Pattren Recognition (LN2)

• Model Choice

– Unsatisfied with the performance of one

classifier, jump to another class of models.

• Training

– Use sample data to train the classifier.

– Use Many different procedures for training:

• Random Sub-sampling

• Bootstrap

• Cross-Validation

PRS Design Cycle

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 29

Page 30: Csc446: Pattren Recognition (LN2)

• Random Sub-sampling

– Repeat a simple holdout method k times.

• Bootstrap

– The training set of size is the size of the data D.

– Sampling with the replacement.

Different Training Methods

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 30

Page 31: Csc446: Pattren Recognition (LN2)

• Cross-Validation: When data is particularly

scarce, Divide data into k disjoint groups, test on k-th

group/train on the rest

– Leave-one-out cross-validation.

Different Training Methods

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 31

Page 32: Csc446: Pattren Recognition (LN2)

• Evaluation

– Measure the error rate (performance) using

different training methods.

– Switch from one set of features to another, or

from one model to another to improve accuracy,

i.e. to minimize error rate.

PRS Design Cycle

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 32

Page 33: Csc446: Pattren Recognition (LN2)

Performance of PR Systems

• Error rate (Prob. of misclassification)

• Speed

• Cost

• Robustness

• Reject option

• Return on investment

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 33

Page 34: Csc446: Pattren Recognition (LN2)

• What is the trade-off between computational

ease and performance?

• How an algorithm scales as a function of the

number of features, patterns or categories?

Computational Complexity

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 34

Page 35: Csc446: Pattren Recognition (LN2)

• Human have the ability to switch rapidly and

seamlessly between different pattern

recognition tasks.

• It is very difficult to design a device that is

capable of performing a variety of different

classification tasks as human.

Limitation of PR Systems

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 35

Page 36: Csc446: Pattren Recognition (LN2)

Summary • Pattern recognition is extremely useful and are now part of

many crucial computer applications:

• Pattern recognition is a very difficult problem and have

many complex sub-problems.

• Successful systems have been built in well constrained

domains.

• No single technique/model is suited for all pattern

recognition problems

• Use of object models, constraints, and context is necessary

for identifying complex patterns

• Careful sensor design and feature extraction can lead to

simple classifiers.

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 36

Page 37: Csc446: Pattren Recognition (LN2)

Next Time

Mathematical Foundations

ASU-CSC446 : Pattern Recognition. Prof. Dr. Mostafa Gadal-Haqq slide 37