Pattern Recognition Random Thoughts N.A. Graf UTeV March 2, 2000

Pattern Recognition Random Thoughts

N.A. Graf

UTeV

March 2, 2000

Pattern Recognition

• There are many kinds of patterns.– Visual, auditory, temporal, logical, …

• Using a broad enough interpretation, we can find pattern recognition in every intelligent activity.

• No single theory of pattern recognition can possibly cope with such a broad range of problems.

Overview

• Restrict our attention to the following 3 classes of pattern recognition techniques:

• Template Matching– Global, fixed patterns

• Hough Transform– Global, parameterized patterns

• Kalman Filter – Local, dynamic state following

• Suppose that we are working with visual patterns, and we know that the patterns of interest represent the 26 letters of the Roman alphabet.

• Then we can say that the pattern recognition problem is one of assigning the input to one of 26 classes.

• In general, we will limit ourselves to the problem of deciding if the input belongs to Class 1 or Class 2 or ... or Class c.

• An obvious approach is to compare the input with a standard pattern for each class, and to choose the class that matches best.

• The obvious problem with this approach is that it doesn't say what to compare or how to measure the degree of match.

Template Matching• Once digitized, one can compare images bit

by bit with a matching template to classify.

• Works very well in specific cases, but not in general (fonts, shearing, rotation, etc.)

Template Matching in HEP• In high energy physics experiments, the

detectors are fixed, so template matching is a good solution for fast characterization of events.

• Commonly used to trigger on charged particle tracks.

• Use MC to build up a library of most probable patterns.

Parametric Feature Extraction

• Often, one is interested in extracting topological information from “images”.

• Finding “edges” in pictures.

• Finding “tracks” in events.

• For patterns which can be parameterized, such as curves, features can be identified using conformal mapping techniques.

The Hough Transform

• Patented by Paul Hough in 1962 as a technique for detecting curves in binary image data.

• Determines whether edge-detected points are components of a specific type of parametric curve.

• Maps “image space” points into “parameter space” curves by incrementing elements of an accumulator whose array indices are the curve parameters.

The Hough Transform

• Developed to detect straight lines using the slope-intercept form

y=mx+b

• Every point in the image gives rise to a line in the accumulator.

• Curve parameters are identified as array maxima– location gives parameters– entries gives number of points contributing

Right-click and “Open in New Window”

http://d0server1.fnal.gov/users/ngraf/Talks/UTeV/Java/Duality.html

The Hough Transform

• Richard Duda and Peter Hart in 1972 introduced the - parameterization.

x

y

y=mx+b

The Hough Transform

• The - accumulator is incremented using values for the angle and radius that satisfy

= xcos + ysin

• Sinusoidal curves are produced.• Intersection of the curves indicates likely location of

lines in the image.• Normal form is periodic, limiting the range of values

for the angle and eliminating the difficulties encountered with large slopes.

Finding Straight Lines in Images

Start with a digitized imageFind the “edges”Apply the Hough transformExtract the features

Other Curves

• The technique can be generalized to include arbitrary parametric curves.

• Finding charged tracks in a solenoidal field motivates the circle algorithm.

F(x,y,a,b) = (x-a)2 + (y-b)2 - r2 = 0

• Simplify by fixing one of the parameters, e.g. radius:

Right-click and “Open in New Window”

http://d0server1.fnal.gov/users/ngraf/Talks/UTeV/Java/Circles.html

Charged Tracks in HEP

• Want to find tracks that come from origin• Construct line connecting each measured point to the

origin.• Orthogonal bisector of this line passes through the

circle’s origin.• Fill accumulator with each point’s line

– one-to-many mapping

• Fill accumulator with intersection of lines coming from two points– many-to-one mapping

x

y

p1

p2

Intersection gives centerof circle pT and 0

Resolution

• The resolution with which one can determine the curve parameters using the Hough transform is determined by the accumulator size.

larger size better resolution more resources

• Use HT for pattern recognition then fit points which contributed to functional form.

• Use Adaptive HT– Use coarse array to find regions of interest– Backmap points to finer-binned accumulator

HT Summary

• Works very well for well-defined problems.• Ideally suited to modern, digital devices.• Global, “democratic” method

– individual points “vote” independently

• Very robust against noise and inefficiency.• Can be generalized to find arbitrary

parameterized curves.• AHT offers solution to trade-off between

speed and resolution

The Kalman Filter• In 1960 Rudolf Kalman published “A new

Approach to Linear Filtering and Prediction Problems” in the ASME Journal of Basic Engineering.

• The best estimate for the state of a system and its covariance can be obtained recursively from the previous best estimate and its covariance matrix.

• Essential for real-time applications with noisy data, e.g. moon-landing, Stock Market predictions, military targeting

Running Average

• Discrete measurements an of a constant A.

• Compare starting over with each new measurement via

with the recursive formula

1

1 n

n ii

A an

1

1 1n n n

nA A a

n n

Filtering

• Another class of pattern recognition involves systems for which an existing state is known and one wishes to add additional information.

How does one reconcile new, perhaps noisy, information with an existing

“best” estimate?

Dynamic System Description

• A discrete dynamic system is characterized at each time tk by a state vector xk, the evolution of which is characterized by a time dependent transformation:

fk: a deterministic function

wk: random disturbance of the system (process noise)

1( )k k k kx f x w

• Normally one only observes a function of the state vector, corrupted by some measurement noise:

• mk: vector of observations at time tk

k: measurement noise

( )k k k km h x

• The simplest case has both f and h linear:

( )1 1f x F x ak k k k k

( )h x H x bk k k k k

Progressive Fitting• There are three basic operations in the analysis of a

dynamic system:

• Filtering:– estimation of the present state vector, based upon all the

past measurements

• Prediction:– estimation of the state vector at a future time

• Smoothing– Improved estimation of the state vector at some time in

the past, based upon all measurements taken up to the present time.

Prediction

• One assumes that at a given initial point the state vector parameters x and their covariance matrix C are known.

• Parameter vector and covariance matrix are propagated to the position i+1 via:

1 1ix F x wi i i i

1 11 1Ti i

C F C F Qi ii i i

Filtering

• At position i+1 one has a measurement mi+1 which can contain measurements of an arbitrary number of the state vector parameters.

• The question is how to reconcile this measurement with the existing prediction for the state vector at this position.

The Kalman Filter

• The Kalman Filter is the optimum solution in the sense that it minimizes the mean square estimation error.

• If the system is linear and the noise is Gaussian, then the Kalman Filter is the optimal filter; no nonlinear filter can do better.

• Combine the noisy measurement with the prior estimate:

where Ki+1 is the Kalman Gain matrix:

1( )1 11 1 1 1 1

T Ti iK C H V H C Hi ii i i i i

(1 1)1 1 1 1i i

x x K m H xi ii i i i

Kalman Filter Flow

Update estimate with measurement

Compute the Kalman Gain

Begin with a prior estimate and covariance matrix

Predict ahead

1 1ix F x wi i i i 1

( )1 11 1 1 1 1T Ti i

K C H V H C Hi ii i i i i (1 1)1 1 1 1

i ix x K m H xi ii i i i

Kalman Filter Advantages

• Combines pattern recognition with parameter estimation

• Number of computations increases only linearly with the number of detectors.

• Estimated parameters closely follow real path.

• Matrix size limited to number of state parameters

Relationship to Least Squares Fitting

• To solve the matrix equation:

we solve

where minimizes

optMx b

Mx b

T optx

• Consider the generalized weighted sum of squared residuals:

to minimize, take the derivative and set to 0.

( ) ( )Topt optMx b W Mx b

( ( ) )T TT T T Topt opt opt opt

opt

dx M WM x b WMx x M Wb b b

dx

2( ) ( )T T T ToptM WM x b WM M Wb

1( )T Toptx M WM M W b

• Consider the Kalman Filter solution:

• For no a priori knowledge about x:

• Giving the Kalman Gain

• The estimate for the state vector is:

b Mx

1 1TC M V M

1 1 1( )T TK M V M M V

1 1 1[( ) ]T Tx M V M M V b

• For a constant system state vector with an overdetermined system of linear equations and no a priori information, the Kalman filter reproduces the deterministic least squares result.

• In most cases, however, one does have prior knowledge and the Kalman filter’s advantage is the convenient way in which it accounts for this prior knowledge via the initial conditions.

• Basically a least squares best fit problem done sequentially rather than in batch mode.

Estimating a Constant

Iteration

Vol

tage

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+ +

+

+

++

+

+

++

+

+

+

++

++

+

++

++ +

+ +

+

+

+++

+

++

+Measurement variance = true varianceMeasurement variance > true varianceMeasurement variance < true variance

Track Fitting

• In the ‘80s, Billoir and Frühwirth adapted the KF to track finding and fitting in HEP.

• Combined pattern recognition with parameter fitting.– Use track state prediction to discriminate between

multiple hits in detector elements.

• Dynamic system accommodates physics:– multiple scattering– energy loss– magnetic field stepping

Fitting a Straight Line in 2D

• State Vector:

• Track Model: Straight Line

/i i

ii i

y position y at x xx

a slopedy dx at x x

1 1ii i ix F x

• The next position is simply the old plus the slope times the interval:

• The slope remains the same:

• Therefore the transformation matrix is:

1 1( )ii i i i iy y a x x

1i iia a

1

1

0 1i

xF

• Ansatz for the initial state:

with a1 arbitrary and M>>1

• Predict next state:

21

11

0;

0i

yx C

a M

1 1 11

1 1

1

0 1ii

y y xaxx

a a

1 1ix F x wi i i i

2 2M x M x

M x M

11 2

12

1 1/

1/ 2 /Ti

ii i i i

xC C H G H

x x

2

1 1 1

1 1 00

0 1 10Ti

i i i i

xC F C F

xM

2

1

1

2

1

1 Tiii i i i ix C C H

yH y y

x

G

• The predicted position equals the measured position at the next surface, since we took the error on the predicted slope to be very large, i,e. we did not trust the prediction; the optimal solution is to use the measurement.

• The predicted slope is y/x, as we would expect for no a priori knowledge.

• The initial guess for the slope does not appear in the final result, since we had assigned the prediction a large uncertainty.

• We now have a good estimate for the slope and its uncertainty and will now iterate.

++

+ +

Kalman Filter Summary

• The Kalman Filter provides an elegant formalism for reconciling measurements with an existing hypothesis.

• Its progressive, or iterative, nature allows the algorithm to be cleanly implemented in software.

• “Extended” KF removes limitations on linear systems with Gaussian noise.

Summary

• I have only barely touched the surface in presenting these three techniques here this evening.

• There exists a broad spectrum of pattern recognition techniques, but these are fairly representative of the most-used ones.

• Go out and implement them!

Documents

Pattern Recognition Random Thoughts N.A. Graf UTeV March 2, 2000