24
Matching Lecture 10

Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

  • View
    233

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Matching

Lecture 10

Page 2: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Topics

• ID parade Frames

• Matching Examples

• Fuzzy Matching

• Scales of measurement

Page 3: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

ID Parade Frames

• Classifying volunteers as clean• Matching suspect to volunteers• Reservation of parade facility, officers,

volunteers• Managing long-running process from decision to

hold parade to payment of volunteers• Accounting – payment to volunteers and billing

of police authorites• Historical record and analysis

Page 4: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Merging multiple frames

• Each frame produces its own model of the actors.

• E.g. Models of volunteer – For matching with suspect– For classification – For payment– For reservation

• For database, problem is called ‘view integration’

Page 5: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Miscellaneous Matching applications

• Many systems have a matching task at their core:– Shazam – sound sample matching– De-duping mailing lists– CD DB - CD recognition– COTS selection– IS development selection

– fingerprint matching– patient/donor matching for transplant surgery– blood typing and matching– patients to clinical trials– interns to placements in hospitals– DNA samples– search request to locate relevant documents– incoming news items to information subscribers– number plate recognition in London’s Congestion Congestion Charging System – speech and writing recognition– patterns to material to minimise wastage

Page 6: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Shazam - 2580

• Shazam is a mobile phone application• It can recognise 1.7 million tracks from a 30 sec sample

– new tracks added at 5,000 a week• The track details are texted back within about 30secs• It costs 50p + 9p call charge (surcharge only if

successful)• Your personal page shows the tracks you have tagged

• www.shazam.com

Page 7: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

De-duping

C WallaceWest England UniversityColdharbour LaneFrenchayBristolBS16 1QY

Ms C WallaceUniv. of the West of EnglandFrenchay CampusColdharbour LaneBristolBS16 1QY

One person or two?

A catalogue from O’Reilly

Mailing lists are reported with 25 – 40% duplicates.

Page 8: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

CD DB• Database of 2.5 million CD’s, track details and

supporting matter run by gracenote (www.gracenote.com)

• Used by media players to obtain track info• Player sends signature of CD [sequence of track

lengths in 1/4sec] to match against the database (via HTTP)

• Application searches DB for best match and returns track info to media player.

• Matching algorithm described in US Patent 6,061,680

Page 9: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Commercial Of the Shelf Software (COTS)

• Software exists for most business needs:– payroll– order processing– general ledger– human resources– e-commerce– e.g. SAP, SAGE ..

• but analysts need to match business needs to COTS capability, and customise generic software for local business rules.

Page 10: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Method selection

• DSDM Product Design Assistant

• “The Product design Assistant (PDA) provides the practitioner with an approach to determine which mechanisms and techniques are appropriate for their project”

• Table 1 Mechanism selection

• Table 2 Technique selection

Page 11: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Police ID parade

• Currently:– Suspect matched to Volunteers visually by

officer

• Information System– Suspect and Volunteers modelled in database– System provides list of matching volunteers

Page 12: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Matching in general

• Matching task typically involve:– two sets of individuals : e.g.

• the suspect / sampled track / DNA sample - The Requirement• the volunteers / 1.7 million stored tracks / DNA on file – The Resource

– ‘adequate’ representations of both – a ‘fitness’ function which calculates how well matched a Resource

is to the Requirement

• Matching processes:– Single or Batch?

• Single: One Req to many Resources• Batch: Many Reqs to many Resources (e.g. cutting)

– Automatic, Interactive, Assistive• Automatic: Matching fully automated• Interactive: User makes final selection, adjusts weights• Assistive : Computer produces analyses which aid human selection

Page 13: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Single Allocation• Allocation to a single Requirement:

– ‘long list’ the Resources - eliminate the obviously unsuitable

– compute fitness between Requirement and each remaining Resource

– rank the Resources in fitness order for a ‘short list’– ? user selection from short list on basis of additional

information unknown to system

• Interactive– User adjusts:

• description of Requirement (e.g the search term in Google)• fitness function (e.g. the weights in the ID parade)

– and retries

Page 14: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Single attribute Matching

• Fuzzy String Matching– Levenshtein distance– Soundex and Metaphone

• Age difference

• Scales of measurement

Page 15: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Fuzzy String Matching

• How close are two strings – words, DNA sequences? • Levenshtein distance

– is the number of single character edits required to change one to the other:

• insert a letter• delete a letter• replace a letter

• E.g.– Receipt & reciept & tecept - distance = 2

• Need a theory of why the strings are different– Better theory for typing would be to count transposition as 1 edit

instead of 2– mutations in DNA matching

Page 16: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Soundex and Metaphone

• Surnames in English have multiple spellings for similar sounds – Wallace and Wallis, Smith and Smythe– Errors caused by similar phonetics having different spelling– Useful where sound-text transliteration occurs in data capture

• e.g. Smith and Smythe• Soundex (Odell and Rusell 1922) reduces every word to a

letter and 3 digits – S530 for both• Metaphone (Philips 1990) smarter about English

phonetics – SM0 for both• Not perfect –

– Kris (K620 and XRS) – Chris(C620 and KRS)

Page 17: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Fuzzy matching• How close are two ages?

• Is the answer different for the identity parade and a dating agency?

0.0

age Suspect Volunteer

Non fitness

Ldeal PersonDate

Page 18: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Multi-attribute Matching

• How to combine multiple attributes to create a single fitness measure?

• Age and Height are different to Build, Eye-colour, Gender and Ethnic origin.

• Distance in 2-D space:

dx

dySqrt(dx^2 + dy^2)

x

y

Page 19: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Multi-attribute matching

• Extract shows a simple Excel spreadsheet containing a suspect age, weight and gender, and the same attributes for 10 volunteers

• Representation– Age is measured in years – Height in cm– Gender is M or F

• Fitness function– Calculate difference between suspect and volunteer attributes– Normalise differences to 0…1– Multiple by weights to express importance of each attribute– Sum of squared differences as Fitness function– Best fit volunteer has minimum value for Fitness

Page 20: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Scales of Measurement• Nominal – names or categories

– E.g. Eye-colour, Ethnic origin, Telephone number, ISBN– Valid operations: =, not =

• Partly Ordered Scales e.g. grandparent, parent, uncle, child, cousin– Pairs are ordered but no overall ordering

• Ordinal – ranks– E.g. 1,2,3 in Derby, 1st ,2.1, 2.2, 3rd class, slight, medium heavy build– Valid operations: <, = , >– Invalid operations : + , - ( gap between 1 and 2, is not the same as between 2 and 3)– Non-parametric statistics may apply

• Interval - arbitrary zero value– E.g. Temperature in degrees F, date in Julian Calendar– Valid Op : - (minus) – Invalid: + , * (but differences are Ratio)

• Ratio – E.g. Length, age – Valid Ops: + , * , /, standard statistical operations

• Multi-dimensional scales (index numbers)– E.g. Miles/gallon, IQ – Compound of several scales of measurement

Page 21: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Suspect/Volunteer attributes

• Nominal – names or codes

• Ordinal – ranks

• Interval - no zero value

• Ratio

Page 22: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Transforming and Scaling

• To combine different attributes, we need to transform Nominal, Ordinal and Interval values to Ratio scales

• This cannot be done objectively, so judgement involved

• Scaling and weights need to be adjustable to fine-tune matching

• => Learning Frame (later)

Page 23: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Sensitivity Analysis

• Arbitrary weights can be adjusted to see what effect their variation has on the final selection

• ? How much would each weight have to change before the first choice is demoted?

• Excel analysis

Page 24: Matching Lecture 10. Topics ID parade Frames Matching Examples Fuzzy Matching Scales of measurement

Tutorial Questions

• Explain the user’s interaction with Shazam to tag a track using a sequence diagram.

• Choose a matching problem with which you are familiar (or choose one from the list)

• Identify the ‘requirement’ and the resources’ and suggest appropriate representations of each

• Identify a suitable fitness function for this problem