22
Crime Section, Central Statistics Office.

Crime Section, Central Statistics Office

  • Upload
    duscha

  • View
    26

  • Download
    2

Embed Size (px)

DESCRIPTION

Case Study- Matching Criminal Justice Administrative Datasets in the absence of common unique identfiers. Crime Section, Central Statistics Office. Acknowledgments. The Crime Section would like to acknowledge the assistance provided by the Probation Service in this project. - PowerPoint PPT Presentation

Citation preview

Page 1: Crime Section, Central Statistics Office

Crime Section,Central Statistics Office.

Page 2: Crime Section, Central Statistics Office

The Crime Section would like to acknowledge the assistance provided by the Probation Service in this project. ◦ In particular, we would like to thank Michael

Donnellan and Aidan Gormley.

Page 3: Crime Section, Central Statistics Office

Connectivity between the various Criminal Justice Database Systems

The Challenge - Absence of unique identifier

The Solution – CSO statistical matching.

Results of matching exerciseFuture Goals

Page 4: Crime Section, Central Statistics Office

•Robust links between PULSE and CCTS.•Tenuous link between PULSE/CCTS and Probation•Need to make these links into strong links - but how?

Page 5: Crime Section, Central Statistics Office

Common unique identifier allows rapid integration of datasets. The common identifiers between PULSE

and CCTS include Charge No., Summons No.

These are linked to the Person PULSE ID in PULSE, to allow linking by individual.

Result: Able to produce statistics combining police and court outcome data.

However, there is a problem....

Page 6: Crime Section, Central Statistics Office

No such common identifier between CCTS/PULSE and Probation Probation Service uses its own unique

identifiers. No linking between this and PULSE

identifiers such as Person PULSE ID and Court Outcome number.

Cannot link the datasets and cannot produce statistics.

Page 7: Crime Section, Central Statistics Office

But a solution exists: If persons in the separate systems can be

matched across variables that exist in both systems: Then a table linking unique identifiers can

be produced. Variables such as first name, surname,

data of birth and address exist in both systems.

These can be used to link the two systems.

This is the basis of the CSO solution.

Page 8: Crime Section, Central Statistics Office

The CSO received a test dataset from the Probation Service, for years 2007 and 2008. Over 8700 data orders with corresponding info.

First, a manual matching exercise was carried out to test feasibility Matching by first name, surnames,

addresses, dates of birth on over 7800 probation records.

A random sample of 800 records It took 8.5 person-days to process this 10%

sample. At this rate, it would have taken over90

days to process the entire dataset.

Page 9: Crime Section, Central Statistics Office

The next step was to automate the matching process, for entire dataset. Fully automated matching solution – not

really possible. A mixed-model method incorporating automatic

and manual matching, to achieve 99% matching. 70% of matches were automatically matched,

without human role. This match was on first name, surname and date of

birth.

Page 10: Crime Section, Central Statistics Office

Additional sorting/matching algorithms to simplify manual matching of remaining 28%. There were four additional stages, with

progressively increasing human role. These were to identify cases where age or

address data does not match, for example. Processes still mainly automated and

algorithm based, so fast to process. The entire process was completed in

2man-day. 99% of all the records (7,800+) matched.

Compared to projected (90+ man days).

Page 11: Crime Section, Central Statistics Office

Step one. Both datasets sorted by names, addresses

and dates of birth. NB All datasets shown are merely representations, not actual data

Page 12: Crime Section, Central Statistics Office

These are large datasets.

Page 13: Crime Section, Central Statistics Office
Page 14: Crime Section, Central Statistics Office

Step Two. The probation and PULSE records are matched

automatically by names and date of birth – using SAS. 70% of entries are matched automatically, this way.

For each probation ID, the corresponding PULSE Ids are listed.

People may have multiple PULSE Ids, for each probation ID.

Page 15: Crime Section, Central Statistics Office

Step Three.The next step is to ensure that surnames

with the prefix “O’” are recorded in the same manner in both datasets

Step has minimal human involvement. One dataset records “O’ ” as “O”

This is not detected or matched in initial stage This can be performed with an automatic

software “Replace” function When the automatic matching (Step Two) is

run again: Now 85% of records match automatically.

Page 16: Crime Section, Central Statistics Office

Step Four◦ The next step is to match on cases where the surname and

date of birth match, first names are closely related:◦ This step has more human involvement. Geographical info

is used as a further check. This allows us to find aliases.◦ Example shown here:

It is clear that although “Liz” and “Elizabeth”, and “Alex” and “Lex” differ, they refer to same person.

MatchProbat. Fake ID

Fake PULSE ID

First Name Probatio

Surname Probat.

Date of Birth Pr

First Letter

First Name PULSE

Surname PULSE

Date of Birth PUL

Address Line 1 Prob

Address Line 1 PULSE

Address Line 2 Prob

Address Line 2 PULSE

Yes ZZ1522 2085343 Alex Great 01/01/1982 A Alexander Great 06/06/1982 Royal Palace Royal Palace Macedon MacedonYes ZZ1522 2085343 Alex Great 01/01/1982 A Alexander Great 06/06/1982 Royal Palace Royal Palace Macedon MacedonYes ZZ1522 2085345 Alex Great 01/01/1982 A Lex Great 06/06/1982 Royal Palace On Campaign Macedon MacedonYes ZZ1522 2085345 Alex Great 01/01/1982 A Lex Great 06/06/1982 Royal Palace On Campaign Macedon MacedonYes ZZ1522 2085345 Alex Great 01/01/1982 A Lex Great 06/06/1982 Royal Palace On Campaign Macedon Macedon

Yes ZM1533 1085389 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex EssexYes ZM1533 1085389 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex EssexYes ZM1533 1085389 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex EssexYes ZM1533 1085391 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex EssexYes ZM1533 1085391 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex EssexYes ZM1533 1085391 Liz Tudor 01/01/1900 L Elizabeth Tudor 30/01/1986 Raleighs Raleighs Essex Essex

Page 17: Crime Section, Central Statistics Office

Step Five.◦ Additional matching steps are then carried out.

One is to check for matching first names, surnames and geographical info, but where dates of birth differ. Special checks can identify matching cases here.

◦ Another set of checks involves searching for matching first name, date of birth but slightly different surnames.

All these steps lead to match of over 95%. The final step is a fully manual operation to

match the remaining 5%

Page 18: Crime Section, Central Statistics Office

The CSO produced detailed results from this linkage.

Tables were produced showing: Number of subsequent First Offices (recidivism), during the period

2008-11, by individuals with probation orders issued in 2007-08 Table B: Subsequent First Offences (recidivism), during the period

2008-11, by individuals with probation orders issued in 2007-08, as percentage of the Original Primary Offence

Table C: Subsequent First Offence (recidivism) by individuals, during the period 2008-11, with probation orders issued in 2007-08 as a percentage of total original primary offences

Table D: Subsequent First Offence (recidivism) during the period 2008-11 of individuals with probation orders issued in 2007-08 as a % of total subsequent First Offences

Unfortunately, we can show only sample data here.

Page 19: Crime Section, Central Statistics Office

Table A: Number of subsequent First Offices (recidivism), during the period 2008-11, by individuals with probation orders issued in 2007-08

Original Primary Offence Subsequent First Offence

Offence Total

Group 01

Homicide offences

Group 02

Sexual Offences

Group 03

Assaults, Attempts

and Threats to Murder,

Harassment and Related

Offences

Group 04

Dangerous and

Negligent Acts

Group 05

Kidnapping and

Related Offences

Group 06

Robbery and

Related offences

Group 07

Burglary and

Related Offences

Group 08

Theft and Related

Offences

Group 09

Fraud and Related offences

Group 10

Drug offences

Group 11

Weapons and

Explosives

Offences

Group 12

Crimes against

Property

Group 13

Public Order

Offences

Group 14

Road Traffic

Offences

Group 15

Offences against Justice

N N N N N N N N N N N N N N N N

01 Homicide Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

02 Sexual Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

03 Attempts/Threats to Murder, Assaults, Harassments and Related offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

04 Dangerous or Negligent Acts 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

05 Kidnapping and Related Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

06 Robbery, Extortion and Hijacking Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

07 Burglary and Related Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

08 Theft and Related Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

09 Fraud, Deception and Related Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

10 Controlled Drug Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

11 Weapons and Explosives Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

12 Damage to Property and to the Environment 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

13 Pubilc Order and other Social Code Offences 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

14 Road and Traffic Offences (n.e.c.) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

15 Offences against Government , Justice Procedures and Organisation of Crime 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

16 Offences Not Elsewhere Classified 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

99 Not Stated 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Total 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 20: Crime Section, Central Statistics Office

Table D: Subsequent First Offence (recidivism) during the period 2008-11 of individuals with probation orders issued in 2007-08 as a % of total subsequent First Offences,

Original Primary Offence

Subsequent First Offence

Offence

Group 01

Homicide offences

Group 02

Sexual Offences

Group 03

Assaults, Attempts

and Threats to Murder,

Harassment and Related

Offences

Group 04

Dangerous and

Negligent Acts

Group 05

Kidnapping and

Related Offences

Group 06

Robbery and

Related offences

Group 07

Burglary and

Related Offences

Group 08

Theft and Related

Offences

Group 09

Fraud and

Related offences

Group 10

Drug offences

Group 11

Weapons and

Explosives Offences

Group 12

Crimes against

Property

Group 13

Public Order

Offences

Group 14

Road Traffic

Offences

Group 15

Offences against Justice

Group 16

Miscellaneous Offences

N N N N N N N N N N N N N N N N

01 Homicide Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

02 Sexual Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

03 Attempts/Threats to Murder, Assaults, Harassments and Related offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

04 Dangerous or Negligent Acts 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

05 Kidnapping and Related Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

06 Robbery, Extortion and Hijacking Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

07 Burglary and Related Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

08 Theft and Related Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

09 Fraud, Deception and Related Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

10 Controlled Drug Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

11 Weapons and Explosives Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

12 Damage to Property and to the Environment 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

13 Pubilc Order and other Social Code Offences 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

14 Road and Traffic Offences (n.e.c.) 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

15 Offences against Government , Justice Procedures and Organisation of Crime 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

16 Offences Not Elsewhere Classified 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

99 Not Stated 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

Total 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

Page 21: Crime Section, Central Statistics Office

Further development of matching model. To incorporate text analysis, fuzzy matching. To develop a fully automatic process to match to

99%.

Page 22: Crime Section, Central Statistics Office

This project shows a simple, effective solution to integrating datasets in the absence of a common identifier.

This project doesn’t invalidate the importance of development of unique identifiers.◦ But it does allow matching of records where it is not

feasible to retroactively apply any planned common identifier.

This method is not limited to Criminal Justice Administrative Data.◦ It can be applied to any datasets with common

information on names, dates of birth etc.