28
The world’s libraries. Connected. ISNI Assignment ISNI Annual General Assembly, Frankfurt 2014 October 2014 OCLC Janifer Gatenby EMEA Program Manager Metadata OCLC

ISNI Assignment

Embed Size (px)

DESCRIPTION

October 2014 OCLC. Janifer Gatenby. EMEA Program Manager Metadata OCLC. ISNI Annual General Assembly, Frankfurt 2014. ISNI Assignment. Assigned 8 million. Provisional: Possible 701,157. Provisional: Unassigned 9,953,505. ISNI Assignment: Batch loading. Independent matching sources. - PowerPoint PPT Presentation

Citation preview

Page 1: ISNI Assignment

The world’s libraries. Connected.

ISNI AssignmentISNI Annual General Assembly, Frankfurt 2014

October 2014OCLC

Janifer Gatenby

EMEA Program Manager MetadataOCLC

Page 2: ISNI Assignment

The world’s libraries. Connected.

Provisional: Unassigned9,953,505

Provisional: Possible701,157

Assigned8 million

Assigned ISNIs October 2014

2 + independent sources3,956,454

3+ VIAF sources494,002

Unique name3,233,924

Single source (JISC names, BOEK, Ringgold)

342, 234Total

8,026,614

Page 3: ISNI Assignment

The world’s libraries. Connected.

ISNI Assignment: Batch loading

Independent matching sources

3 VIAF sources

Page 4: ISNI Assignment

The world’s libraries. Connected.

ISNI Matching

Name

Title

Partial title

Rare title word

Date

Publisher

Personal affiliation

Organisation affiliation

ISBN, ISWC, ISAN, DOI +

Other name identifier e.g. IPI, VIAF, IPD

Instrument

Linked entities

Dewey classification

Scores are collected from each judge (ice skating style)

Lowered for common surnames and common titles

Score > .85 = match

Score >.6 but <.85 = possible match

Scores are collected from each judge (ice skating style)

Lowered for common surnames and common titles

Score > .85 = match

Score >.6 but <.85 = possible match

Page 5: ISNI Assignment

The world’s libraries. Connected.

ISNI Assignment: Batch loading

Unique name

Single source

Page 6: ISNI Assignment

Central database - TrustCentral database - Trust

+ % confidence

- % confidence

Provisional: Unassigned9+ Million

Provisional: Possible≈638,000

Assigned

≈ 8 million

Assignment is curatedAssignment is curatedAuthoritativeAuthoritative

UniqueUniqueTrustfulTrustful

PersistentPersistent

Assignment only if confidentAssignment only if confident

Publicly accessible www.isni.org

Matching algorithmsData samplingAnomaly checksQuality assurance processesEnd User input notes

Page 7: ISNI Assignment

The world’s libraries. Connected.

ConfidenceThe two main problems for maintaining persistence are

• duplicates needing to be merged• undifferentiated identities needing to be split

ISNI errs on the side of making duplicates rather than mixed identities

Thus the batch load process (usually) makes a provisional record• where there is no match (for fear of making a duplicate assignment)• where there is a low confidence match (for fear of making a mixed identity or a duplicate assignment)• where a matching record already has another local ID for the same source, regardless of the strength of the match (for fear of making a mixed identity)

Page 8: ISNI Assignment

The world’s libraries. Connected.

Procedures for maximizing assignment

• Refinement of matching algorithms

• E.g. introduced rare title word;

• Now ignoring date of birth 1900

• Re-import program

• Rematch with new rules

• Rematch after new data added

• ISNI Quality Team: Data sampling

• assessing impact of single source

• Recommendations for program changes

• New criteria

• Assessing uncommon surname assignment

• Rules for online rich assignment

Page 9: ISNI Assignment

The world’s libraries. Connected.

Online: Guarantee assignment – Personal Name

ISNIs will be automatically assigned where there are no possible matches in these cases:

There are matches with a database record with a different source A personal name is unique and includes a surname and forename The request includes an “isNot” statement The metadata supplied is considered rich as per these cases:

• Full date of birth and death supplied• Year of birth + 1 title or instrument+ 1 related name (co-

author or affiliated institution)• 1 title or instrument + 1 external URL link of type

encyclopaedia, home page (not social network page) + 1 related name (co-author or affiliated institution)

The request is resolving a possible match by including a PPN

Page 10: ISNI Assignment

The world’s libraries. Connected.

Online: Guarantee assignment – Organisation Name

ISNIs will be automatically assigned where there are no possible matches in these cases:

There are matches with a database record with a different source An organisation name is unique and does not consist only of abbreviations The metadata supplied is considered rich as per these cases:

• Includes LOCODE &• Organisation type &• Organisation URL

The request is resolving a possible match by including a PPN

Page 11: ISNI Assignment

The world’s libraries. Connected.

Maximizing assignment

Enter a request record online (Web page or via API)

Batch loaded records – passive method

• Quality Team manual fixes

• OCLC periodic re-match runs

• Matches from later batch loading & online activity

Batch loaded records – active method

• Resolve possible matches found by the system

• Search the database for candidate records for merging

• Enrich a record with URLs to external sources such as author’s web pages, Wikipedia, IMDB, MusicBrainz, Discogs, etc.

May 2012 % assigned Oct 2014 % assigned

ALCS 41,523 63.86% 49,157 76.66%

PROL 2,205 35.24% 4,143 66.18%

PROQ 65,122 12.89% 243,481 48.19%

May 2012 % assigned Oct 2014 % assigned

AUVLU 0 0% 1,716 48.28%

ICLA 0 0% 2,208 97.61%

Page 12: ISNI Assignment

The world’s libraries. Connected.

Finding possible matches

Command What it finds

Cn: proq & bs: [01]* All your records with a possible match

Cn: proq & bs: 1* Exact duplicates

Cn: proq & bs: 09* Probably your duplicates

Cn: proq & bs: 08* Most likely are matches

Cn: proq & bs: 07* Possible matches

Cn: proq & bs: 06* Possible matches, lower match confidence

DECISIONS Records should mergeOne of the records should split (note to QT)Different identities

Page 13: ISNI Assignment

The world’s libraries. Connected.

Resolving Possible Matches

ClickClick

Page 14: ISNI Assignment

The world’s libraries. Connected.

Compare Screen

Page 15: ISNI Assignment

The world’s libraries. Connected.

Adding a new record – Michel Calame

Page 16: ISNI Assignment

The world’s libraries. Connected.

Adding a new record

Page 17: ISNI Assignment

The world’s libraries. Connected.

Adding a new record

Page 18: ISNI Assignment

The world’s libraries. Connected.

Adding a new record for an Organisation

Page 19: ISNI Assignment

The world’s libraries. Connected.

New Organisation form

Page 20: ISNI Assignment

The world’s libraries. Connected.

Adding your source to an existing record

Page 21: ISNI Assignment

The world’s libraries. Connected.

Adding your source to an existing record

Page 22: ISNI Assignment

The world’s libraries. Connected.

Correcting and enriching

These are all the same person. The second has an incorrect DOB = 1900These are all the same person. The second has an incorrect DOB = 1900

Page 23: ISNI Assignment

The world’s libraries. Connected.

Enriching

You can add a source note or general note to any database record, your code does not need to be present

Page 24: ISNI Assignment

The world’s libraries. Connected.

Reporting errors

The general note will trigger an email to the ISNI Quality Team for attention

Page 25: ISNI Assignment

The world’s libraries. Connected.

• Requests and replacements (you can replace your existing data citing local identifier)

• Request • Atom Pub Header

• Content = Request in the ISNI XML Request schema

• Documentation• ISNI Atom Pub API guidlines.doc

• ISNI request.xsd (XML schema)

• ISNI request schema.doc (describes the schema)

• ISNI response.xsd (XML schema)

• ISNI response schema.doc (describes the schema)

Atom Pub API (Machine to machine)

Page 26: ISNI Assignment

The world’s libraries. Connected.

Documentation: Data Submission

Documents relating to data submission

ISNI tab delimited formatISNI tab delimited format organisationsISNI data element valuesISNI XML request schemaISNI XML request schema documentISNI Atom Pub interactive request requirementsISNI Data contributors usage guidelinesISNI database source profiles RAG information

ISNI bulk load submission

Documents relating to data submission output

ISNI XML response schemaISNI XML response schema documentISNI XML notification schemabulk load assigned ISNIs.xsdbulk load ISNI not assigned.xsdbulk load too many matches.xsdISNI Data contributors reports and notifications guidelines

Page 27: ISNI Assignment

The world’s libraries. Connected.

ISNI Charges

Enquiry no charge

Resolving possible match; no charge

Resolving non match no charge

Correcting information or adding information to an existing record

no charge

Adding a source to a record (status is assigned, provisional or suspect) or

Adding a new record

100 p.a. free

ISNI request rate*

Page 28: ISNI Assignment

The world’s libraries. Connected.

What is requested from ISNI Data Contributors?

Ingest ISNIs

Act on notifications (new assignments, changed assignments, errors and queries)

Assist in reviewing possible matches (Exact matches then possible matches)

Add a note to any record found with an error

Keep data up to date(become a RAG or use the services of an existing one)

Supply URI