21
OAIster: What’s OAIster: What’s with the Weird with the Weird Name? Name? Kat Hagedorn Kat Hagedorn UM Library Information UM Library Information Technology Technology November 28, 2005 November 28, 2005

OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

OAIster: What’s with the OAIster: What’s with the Weird Name?Weird Name?

Kat HagedornKat Hagedorn

UM Library Information TechnologyUM Library Information Technology

November 28, 2005November 28, 2005

Page 2: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

What is OAIster?What is OAIster?

Is/was a means for UM to test the OAI Is/was a means for UM to test the OAI protocol… (hence the name)protocol… (hence the name)

A method for sharing metadata among A method for sharing metadata among institutions and groups of peopleinstitutions and groups of people

A means of developing a search service for A means of developing a search service for end-users worldwideend-users worldwide

Page 3: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Basics of OAIBasics of OAI

Page 4: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

What does OAIster collect?What does OAIster collect?

Harvests all metadata from all OAI data Harvests all metadata from all OAI data providers (within reason)providers (within reason)

Only keeps metadata that points to digital Only keeps metadata that points to digital objects, e.g., articles, photographs, objects, e.g., articles, photographs, datasets, etc. in digitized formdatasets, etc. in digitized form

All available via search service…All available via search service…

Page 5: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Searching OAIsterSearching OAIster

Time to show off OAIster…Time to show off OAIster… http://www.oaister.org/http://www.oaister.org/

Page 6: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

A little historyA little history

Service is now 3.5 years oldService is now 3.5 years old Started with 66 data providers and a little Started with 66 data providers and a little

over 200K recordsover 200K records Now have 572 data providers and “a little” Now have 572 data providers and “a little”

over 6 million recordsover 6 million records 37% US, 63% international37% US, 63% international

Page 7: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Visibility of OAIVisibility of OAI

Surprising who hasn’t made their metadata Surprising who hasn’t made their metadata shareable through OAIshareable through OAI Harvard, Yale, Stanford…the big onesHarvard, Yale, Stanford…the big ones

Initially perplexing, but now clearer:Initially perplexing, but now clearer: always done at the endalways done at the end only recently thought of at initiation of projectsonly recently thought of at initiation of projects truthfully, many institutions not collaborative…truthfully, many institutions not collaborative…

Page 8: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Examples of data providersExamples of data providers

Many data providers are huge, e.g.,Many data providers are huge, e.g., arXiv: physics preprint and postprint articlesarXiv: physics preprint and postprint articles pubmed: medical articles, although restrictedpubmed: medical articles, although restricted pictureaustralia: images from govt and pictureaustralia: images from govt and

academic institutions in Australiaacademic institutions in Australia lcoa: Library of Congress digital archiveslcoa: Library of Congress digital archives usc: U South California census datausc: U South California census data

Page 9: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Examples of data providersExamples of data providers

Most are small, thoughMost are small, though Many around 100 recordsMany around 100 records Value of making their records availableValue of making their records available

increased visibilityincreased visibility inclusion in bigger search service than theirsinclusion in bigger search service than theirs incorporation in Yahoo! Searchincorporation in Yahoo! Search

Page 10: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Yahoo! SearchYahoo! Search

Two years ago, collaborated with team at Two years ago, collaborated with team at Yahoo! Search to send our metadata to Yahoo! Search to send our metadata to them for indexingthem for indexing e.g., “gardens at albury” in Yahoo! Searche.g., “gardens at albury” in Yahoo! Search know it’s not static html robotingknow it’s not static html roboting <dc:relation>IspartOf Victorian Railways <dc:relation>IspartOf Victorian Railways

collection.</dc:relation>collection.</dc:relation> Many, many more hitsMany, many more hits Also send metadata to GoogleAlso send metadata to Google

Page 11: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

System designSystem design

UM harvester

Record storage

XSLT transformation

tool

BibClass indexes

OAI-enabled DC records

Non-OAI-enabled

DC records

XSL stylesheets (per source

type)

Search interface(XPAT)

Page 12: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Transformation of metadataTransformation of metadata

Most metadata needs to be brushed offMost metadata needs to be brushed off adding an http:// to the front of URLsadding an http:// to the front of URLs

Or rakedOr raked removing instances of <![CDATA[removing instances of <![CDATA[

Or wrung outOr wrung out instead of “Where’s Waldo,” it’s “Where’s the instead of “Where’s Waldo,” it’s “Where’s the

incorrect UTF-8 character?”incorrect UTF-8 character?” And should be normalized…And should be normalized…

Page 13: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Why normalize?Why normalize?

Sample date valuesSample date values

<date>2-12-01</date><date>2-12-01</date><date>2002-01-01</date><date>2002-01-01</date><date>0000-00-00</date><date>0000-00-00</date><date>1822</date><date>1822</date><date>between 1827 and 1833</date><date>between 1827 and 1833</date><date>18--?</date><date>18--?</date><date>November 13, 1947</date><date>November 13, 1947</date><date>SEP 1958</date><date>SEP 1958</date><date>235 bce</date><date>235 bce</date><date>Summer, 1948</date><date>Summer, 1948</date>

Page 14: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Why use a CV?Why use a CV?

Sample subject valuesSample subject values

<subject>30,51,52</subject><subject>30,51,52</subject>

<subject>1852, Apr. 22. E[veritt] Judson, letter to Philuta <subject>1852, Apr. 22. E[veritt] Judson, letter to Philuta [Judson].</subject>[Judson].</subject>

<subject>Slavery--United States--Controversial <subject>Slavery--United States--Controversial literature</subject>literature</subject>

<subject>view of interior with John Henry <subject>view of interior with John Henry sculpture</subject>sculpture</subject>

<subject>Particles (Nuclear physics) -- <subject>Particles (Nuclear physics) -- Research.</subject>Research.</subject>

Page 15: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Best practicesBest practices

Fixing more than half of the data providers is Fixing more than half of the data providers is cumbersomecumbersome

Individuals at OAI-enabled institutions Individuals at OAI-enabled institutions started a “Best Practices” group to inform started a “Best Practices” group to inform data providers what they ought to dodata providers what they ought to do

http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContentsTableOfContents

Page 16: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

2nd phase OAI2nd phase OAI

““Best Practices” group sponsored by the Best Practices” group sponsored by the Digital Library Federation, which also…Digital Library Federation, which also…

Sponsors our latest grantSponsors our latest grant Better and more easily calculated statisticsBetter and more easily calculated statistics Search interface improvementsSearch interface improvements Clustering / classification techniquesClustering / classification techniques Using richer metadataUsing richer metadata

Page 17: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Clustering / classificationClustering / classification

Using automated means to take a selection Using automated means to take a selection of metadata and determine “what it’s about”of metadata and determine “what it’s about”

Working with Emory University (one of our Working with Emory University (one of our grant partners) to test their toolgrant partners) to test their tool

Results will be integrated into search so can Results will be integrated into search so can search in smaller group of OAIster recordssearch in smaller group of OAIster records

Page 18: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Using richer metadataUsing richer metadata

Data providers must use simple Dublin CoreData providers must use simple Dublin Core Very sparse schema for describing objectsVery sparse schema for describing objects

dc:title must contain main title, sorted title and dc:title must contain main title, sorted title and alternative titlesalternative titles

dc:subject doesn’t distinguish between dc:subject doesn’t distinguish between geographical, hierarchical, temporal…geographical, hierarchical, temporal…

Page 19: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Using richer metadataUsing richer metadata

Encouraging use of richer metadata, Encouraging use of richer metadata, especially MODS (Metadata Object especially MODS (Metadata Object Description Schema) from LOCDescription Schema) from LOC

Developed testbed for grant deliverablesDeveloped testbed for grant deliverables currently only shows MODS work… currently only shows MODS work… http://www.hti.umich.edu/m/mods/http://www.hti.umich.edu/m/mods/

Page 20: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Other stuffOther stuff

Well, make it smaller somehow…Well, make it smaller somehow… Clean up Boolean interfaceClean up Boolean interface

squinch fields togethersquinch fields together include more normalizationinclude more normalization

Make it available through federated searchMake it available through federated search Proselytize sharing metadataProselytize sharing metadata Test, test, testTest, test, test

Page 21: OAIster: What’s with the Weird Name? Kat Hagedorn UM Library Information Technology November 28, 2005

Contact meContact me

Kat HagedornKat Hagedorn UM Library Information TechnologyUM Library Information Technology [email protected]@umich.edu www.oaister.orgwww.oaister.org