64
Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at Urbana-Champaign 1

Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Embed Size (px)

Citation preview

Page 1: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Cataloging and Metadata:

What does the Future Hold – Issues and Perspectives

Michael NormanHead of Content Access

ManagementUniversity of Illinois at

Urbana-Champaign1

Page 2: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

http://netfiles.uiuc.edu/manorman/ILS.ppt

Page 3: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Please, not another how we do it good!

• It will be okay. I’ve got a lot of good things to show you. And, hopefully advance the discuss on these important issues and and solutions.

• I will recount some of the successes we have had – and detail some of the mistakes we have made this past year or so.

• Good, quality, shareable metadata is so damn important

3

Page 4: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

@ UIUC Library

• We have learned a lot through the various digitization projects we have been involved with including Open Content Alliance, Illinois Harvest project and starting Google Digitization Project the next few months.

• We have learned quite a bit about cataloging and metadata, access systems, search and metasearch, digital preservation, and better ways to make all this information findable.

4

Page 5: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Conversation about the ILS Today, I’m wanting to have a dialogue with you about

where we think we are concerning: online catalogs, metadata, other access options outside the library world, metasearch, and where do we go from here to offer better search for

our users. We are at a critical stage. Our online catalog is not very good at allowing users to

find what they seek. There are better options

5

Page 6: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

6

Page 7: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Problems with our online catalogs

• Search is difficult• Does not include many of the available

resources in our collections, including images, digital collections, many of our electronic resources, archival materials

• Our metadata does not include much of the pertinent information needed to make a judgment about a resource

• Our metadata is hard to discern

7

Page 8: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

8

Page 9: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Better Options than our Online CATS

I’m almost at a point where I’d advise our users at University of Illinois at Urbana-Champaign to begin there outside our online catalog (particularly Microsoft Live Book Search, Amazon, and Google Book Search)

Then after she or he get their results, come back and search our catalog to see if we have it (either digital or print version)

Does not make me very happy to say that.

9

Page 10: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Why search elsewhere first?

• User can evaluate resource much better through search at Microsoft, Amazon and Google

• Information such as table of contents, indices, bibliographies, cover images, cover data, summaries, biographies, reviews, etc make it easier to determine if resource helps one research or not

10

Page 11: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Amazon.com

11

Page 12: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Key Phrases – Amazon’s CAPs and SIPs

12

Page 13: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

13

Page 14: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Amazon’s CAPs and SIPs

• Capitalized Phrases (CAPs) are people, places, events, or important topics mentioned frequently in a book.

• Statistically Improbable Phrases (SIPs) are the most distinctive phrases in the text of books in Amazon’s Search Inside the Book. To identify SIPs, they scan the text of all books in the Search Inside program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book.

14

Page 15: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Machiavelli in Amazon

15

Page 16: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

CAPS, Search in the Book, etc.

16

Page 17: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Interlinking of Citations

17

Page 18: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Amazon’s Inside the Book

18

Page 19: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Microsoft Live Books

19

Page 20: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Microsoft Live – Inside Book - Index

20

Page 21: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Microsoft Live Books - Bibliographies

21

Page 22: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google Book Search

22

Page 23: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google Book Search – Full text content

23

Page 24: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google – Publisher supplied

24

Page 25: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google – TOCs, Summaries

25

Page 26: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google – References from books, articles, related books

26

Page 27: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google’s Metadata Records

27

Page 28: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Google’s Metadata Records (continued)

28

Page 29: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Multiple sources of data

• Amazon, Microsoft, and Google are getting this data from various sources including from publishers, vendors such as Bowker, digitization of materials, and harvesting metadata from evaluative sources.

• Millions of full-text or partial full-text content

29

Page 30: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

30

Page 31: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Still far behind in breadth of collection

• Amazon, Google and Microsoft still don’t have it right. When we do a search, we are searching everything. If you do a search in Microsoft, it is searching across the entire body of full-text content. It is hard to do an advanced search of title, author, series title, publisher, etc.

• They do not have the breadth of titles or sources we have or OCLC WorldCat has. We have a couple hundred years of collecting on them. In 5 to 6 years, yes, they probably will. Eventually, may be able to search across 60 million full-text resources.

31

Page 32: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Why Amazon, Microsoft, Google?

• Why am I showing what Amazon, Microsoft and Google are doing in regards to search? To make us all feel bad. Maybe. Just a little.

• Really to show alternatives to our online catalogs. What is out there.

• But also to show us some of the opportunities, how we can do better.

• Central to this is metadata – creating surrogate records that help lead users to what they are want

32

Page 33: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

UIUC work with Open Content Alliance

33

Page 34: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Examples of digitized books

34

Page 35: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Downloading of resources

35

Page 36: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

The present

36

Page 37: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

NCSU Endeca Catalog

37

Page 38: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Vanderbilt’s Primo

38

Page 39: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Vanderbilt Primo title level

39

Page 40: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Oklahoma State’s Aquabrowser

40

Page 41: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Title level - Aquabrowser

41

Page 42: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Aquabrowser – Searchable TOCs and Summaries

42

Page 43: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

UIUC Various Access Systems• Voyager ILS system• CONTENTdm – digital images• Dspace – IDEALS, Illinois Institutional Repository• DLXS – digital text• Olive – Newspapers and Serials • Online Research Resources (ORR) – local

electronic resources management system• Discover/SFX OpenURL knowledge base

43

Page 44: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Metasearch – Is it the answer?

44

Page 45: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

UIUC’s Information Gateway

45

Page 46: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Easy Search Results (metasearch)

46

Page 47: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Illinois Harvest – metasearch across formats from OAI Harvesting

47

Page 48: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Illinois Harvest - results with images, learning objects, digitized books, and

streaming audio

48

Page 49: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Positives

• They are pulling in metadata from multiple sources, including the publishers, intermediate vendors and from digitization projects

• They are adding value such as Google maps and textual analysis

• We are still cataloging for a surrogate record environment and we have got to move beyond that quickly.

• We do not have the metadata structures to pull in and incorporate much of the data that is out there. The metadata that Amazon, Microsoft and Google are bringing to bear.

49

Page 50: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Possibilities• We have access to the same sources of metadata. • We can get ONIX feeds from publishers. • We can harvest table of contents, indexes and

bibliographies from the works we are digitizing.• We can add cover images, book reviews, summaries

and abstracts.• We can crunch data and performing datamining as well

as they can• With the help of OCLC, we can layer such applications

as WorldCat Identities and authority control on top of all this.

50

Page 51: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

WorldCat Identities

51

Page 52: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

WorldCat Identities - Machiavelli

52

Page 53: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

WorldCat Identities Display

53

Page 54: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Identities - Continued

54

Page 55: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

55

Page 56: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Metadata• MARC records still have a role to play.• Cannot be the only game in town anymore. It

is not a flexible enough structure or standard to accommodate researchers need, especially with the technological opportunities we have today.

• It cannot accommodate much of the data we need to produce interconnectivity (linking) between resources

56

Page 57: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

MARC – Where are we at now?

• Libraries – we still do most of our cataloging in MARC

• Other viable schemas – Dublin Core (both Simple and Qualified), MODS, MARCXML

• Preservation metadata schemas (such as PREMIS) • Content standards (such as AACR2 and CCO) • Controlled vocabularies (such as LCSH, TGN, AAT

and other applicable vocabularies) • Transmission standards such as METS

57

Page 58: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

ONIX (Online Information Exchange)

• ONIX is a standard format that publishers use to distribute electronic information about their books to wholesale, e-tail and retail booksellers, and other publishers.

• Standard XML template for organizing data storage

58

Page 59: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Metadata Encoding & Transmission Standard (METS)

• The METS schema provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a digital library object, and for expressing the complex links between these various forms of metadata.

• Provide a useful standard for the exchange of digital library objects between repositories.

• METS provides the ability to associate a digital object with behaviours or services.

59

Page 60: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Interconnectivity

• We can start to create the search environment that allows one to move from

• citation • to full-text content • to other works about or cited within a work• continue to next full-text resource• Each year over the next 7 years, we will be able to

move from full-text content to full-text content• Moving from bibliography to bibliography, citation to

citation; OpenURL can show us the way

60

Page 61: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Automating Metadata Generation

• I’m the chair of the Automating Metadata Generation Task Force formed by the ALCTS Big Heads of Technical Services and we will have a white paper out this fall outlining the capabilities and possibilities of automating the creation of metadata records.

• And, yes, we can automate many of our processes for creating metadata.

Page 62: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

Our structures and standards cannot support this presently

• Can’t fit a lot of this data into a MARC record• No real standards for indexes, table of contents,

citations, bibliographies. The mark-up languages can accommodate this. To easily pull these valuable data from a resource, need to be able to easily identify and harvest

• Can get this data from publishers for recent publications and pull from digitization projects for older materials

• Pull together using metadata record, ONIX and METS wrapper

62

Page 63: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

New Systems• Need system that can read MARC and XML or has the

ability to easily convert MARC to MARCXML• Allows search across surrogate records and full-text

content• Relevancy ranking• User can easily discern different formats pulled in

through metasearch (monographs, articles, images, datasets, citations, etc.)

• Strong structured search and also powerful keyword indexing

• Easy to determine how best to get this piece of information (i.e. Open WorldCat)

63

Page 64: Cataloging and Metadata: What does the Future Hold – Issues and Perspectives Michael Norman Head of Content Access Management University of Illinois at

New Systems (continued)

• Ability to harvesting data from multiple sources

• Ability to keep this data current and accurate• Ability to track changes to this data, ensuring

we always keep the best• Have to automate a lot of these processes• Technologies exist to allow us to do it• Collaboration

64