14
Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members’ Council Meeting October 2005

Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Embed Size (px)

Citation preview

Page 1: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Anatomy of Aggregate Collections:The Example of Google Print for Libraries

Brian LavoieSenior Research ScientistOCLC Research

OCLC Members’ Council MeetingOctober 2005

Page 2: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Aggregate collections

Boundaries between local and external collections increasingly blurred … • Resource sharing (digital/network technologies)• Cooperative collection management (resource allocation)

Shift in focus to resources of the “system” (or subsets of the system), rather than individual collections

Need data to support/illuminate system-wide perspective• Characterize/analyze aggregate collections

WorldCat: largest aggregate collection• Aggregate holdings of >20,000 libraries• Bridge from local to system-wide perspective

Page 3: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

The system-wide print book collectionas represented in WorldCat (January 2005)

0

10,000,000

20,000,000

30,000,000

40,000,000

50,000,000

60,000,000

Total WorldCat Records Language-based monographs Language-based monographs,excluding government

documents andtheses/dissertations

Language-based monographs,excluding government

documents andtheses/dissertations, in print

format only

~55 million

~41 million

~35 million

~32 millionprint books

More information:http://www.oclc.org/research/presentations/lavoie/cni2005.ppt

Page 4: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Google Print for Libraries

Aggregate collection of print books• Aggregate print book holdings of five major research libraries

(Harvard, Michigan, Oxford, NYPL, and Stanford)

Focus on copyright issues; very little discussion of Google Print for Libraries as an aggregate collection• What are characteristics of this aggregate collection?• How does it relate to the system-wide collection?• WorldCat: useful data source for analysis

Lavoie, Connaway, Dempsey: “Anatomy of Aggregate Collections: The Example of Google Print for Libraries” D-Lib (September 2005)• http://www.dlib.org/dlib/september05/lavoie/09lavoie.html

Page 5: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

G5 coverage of system-wide print book collection

33% Held by at

least one G5library

67%Not held

10.5 millionunique books10.5 million

unique books

Page 6: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Holdings overlap

61%Held by 1

20%Held by 2

10%Held by 3

6%Held by 4

3%Held by 5

Potential redundancyrate of 40 percent

Potential redundancyrate of 40 percent

Page 7: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Language distribution

Language Google 5 System-wideEnglish 0.49 0.52German 0.10 0.08French 0.08 0.08Spanish 0.05 0.06Chinese 0.04 0.04Russian 0.04 0.03Italian 0.03 0.03Japanese 0.02 0.04Hebrew 0.02 0.01Arabic 0.01 0.01Portuguese 0.01 0.01Polish 0.01 0.01Dutch 0.01 0.01Latin 0.01 0.01Korean 0.01 0.01Swedish 0.01 < 0.01All others 0.07 0.08

More than 430languages in

Google 5collection

More than 430languages in

Google 5collection

Page 8: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Cumulative age distribution of G5 holdings

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Years

Pro

po

rtio

n P

ub

lish

ed D

uri

ng

or

Pri

or

To

C

urr

ent

Yea

r

> 80 percent of Google 5collection still in copyright

> 80 percent of Google 5collection still in copyright

Page 9: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Works

10.5 million9.1 million

26.1 million

32 million

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Manifestations Works

Google 5

System-wide

Coverage slightlyhigher (35 %)

Holdings overlapslightly greater

(56 % held uniquely)

Coverage slightlyhigher (35 %)

Holdings overlapslightly greater

(56 % held uniquely)

Page 10: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Some speculation …

What results would have been obtained if a different group of libraries had been selected?

What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5?

Chose 5 new libraries:• Small US liberal arts college• Large US public university• Large US private university• Large US metropolitan library• Large Canadian university

Page 11: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Beyond the Google 5 …

“New” Google 5 “Original” Google 5Total holdings: ~8 million ~18 millionTotal unique books: 5.9 million 10.5 million% of system-wide: 18 percent 33 percent

Redundantholdings: 26 percent 42 percent

Impact by library type: % of holdings unique relative tooriginal G5 collection:

Large US metropolitan library: 39 percent (most unlike G5)Large US private university: 25 percentLarge Canadian university: 23 percentLarge US public university: 21 percentSmall US liberal arts college: 13 percent (most like G5)

Page 12: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

“The Google 10”

OriginalGoogle 5

(10.5 million books)

Google 10 collection:12.3 million books

+ 1.8 million (17 %)

Google 10 collection:12.3 million books

+ 1.8 million (17 %)

Diminishing returns?

Original G5:~18 million holdings58% unique

New G5:~8 million holdings22% unique

Page 13: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Mass digitization programs and other aggregate collections increasingly common features of library landscape

Effective decision-making/planning aided by convergence on set of standard questions that help map out anatomy of aggregate collections

Example: mass digitization programs • What are characteristics of overarching population of materials that is

target of digitization effort? • How much of population will digitization effort cover?• What is potential degree of redundancy?• What bibliographic unit is focus of digitization (e.g., manifestations,

expressions, works)?• What number of participants and combination of institution types is

optimal for obtaining maximum benefit with minimum cost?

Anatomy of aggregate collections

Page 14: Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting

Aggregate collections and WorldCat

WorldCat more than tool for catalogingand reference; also strategic resource formanaging aggregate collections

OCLC Group Services• http://www.oclc.org/groupservices/

OCLC WorldCat Collection Analysis Service• http://www.oclc.org/collectionanalysis/

OCLC Research data-mining activities• Web site: http://www.oclc.org/research/projects/mining/