Upload
jordan-wright
View
213
Download
1
Embed Size (px)
Citation preview
Anatomy of Aggregate Collections:The Example of Google Print for Libraries
Brian LavoieSenior Research ScientistOCLC Research
OCLC Members’ Council MeetingOctober 2005
Aggregate collections
Boundaries between local and external collections increasingly blurred … • Resource sharing (digital/network technologies)• Cooperative collection management (resource allocation)
Shift in focus to resources of the “system” (or subsets of the system), rather than individual collections
Need data to support/illuminate system-wide perspective• Characterize/analyze aggregate collections
WorldCat: largest aggregate collection• Aggregate holdings of >20,000 libraries• Bridge from local to system-wide perspective
The system-wide print book collectionas represented in WorldCat (January 2005)
0
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
Total WorldCat Records Language-based monographs Language-based monographs,excluding government
documents andtheses/dissertations
Language-based monographs,excluding government
documents andtheses/dissertations, in print
format only
~55 million
~41 million
~35 million
~32 millionprint books
More information:http://www.oclc.org/research/presentations/lavoie/cni2005.ppt
Google Print for Libraries
Aggregate collection of print books• Aggregate print book holdings of five major research libraries
(Harvard, Michigan, Oxford, NYPL, and Stanford)
Focus on copyright issues; very little discussion of Google Print for Libraries as an aggregate collection• What are characteristics of this aggregate collection?• How does it relate to the system-wide collection?• WorldCat: useful data source for analysis
Lavoie, Connaway, Dempsey: “Anatomy of Aggregate Collections: The Example of Google Print for Libraries” D-Lib (September 2005)• http://www.dlib.org/dlib/september05/lavoie/09lavoie.html
G5 coverage of system-wide print book collection
33% Held by at
least one G5library
67%Not held
10.5 millionunique books10.5 million
unique books
Holdings overlap
61%Held by 1
20%Held by 2
10%Held by 3
6%Held by 4
3%Held by 5
Potential redundancyrate of 40 percent
Potential redundancyrate of 40 percent
Language distribution
Language Google 5 System-wideEnglish 0.49 0.52German 0.10 0.08French 0.08 0.08Spanish 0.05 0.06Chinese 0.04 0.04Russian 0.04 0.03Italian 0.03 0.03Japanese 0.02 0.04Hebrew 0.02 0.01Arabic 0.01 0.01Portuguese 0.01 0.01Polish 0.01 0.01Dutch 0.01 0.01Latin 0.01 0.01Korean 0.01 0.01Swedish 0.01 < 0.01All others 0.07 0.08
More than 430languages in
Google 5collection
More than 430languages in
Google 5collection
Cumulative age distribution of G5 holdings
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Years
Pro
po
rtio
n P
ub
lish
ed D
uri
ng
or
Pri
or
To
C
urr
ent
Yea
r
> 80 percent of Google 5collection still in copyright
> 80 percent of Google 5collection still in copyright
Works
10.5 million9.1 million
26.1 million
32 million
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
Manifestations Works
Google 5
System-wide
Coverage slightlyhigher (35 %)
Holdings overlapslightly greater
(56 % held uniquely)
Coverage slightlyhigher (35 %)
Holdings overlapslightly greater
(56 % held uniquely)
Some speculation …
What results would have been obtained if a different group of libraries had been selected?
What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5?
Chose 5 new libraries:• Small US liberal arts college• Large US public university• Large US private university• Large US metropolitan library• Large Canadian university
Beyond the Google 5 …
“New” Google 5 “Original” Google 5Total holdings: ~8 million ~18 millionTotal unique books: 5.9 million 10.5 million% of system-wide: 18 percent 33 percent
Redundantholdings: 26 percent 42 percent
Impact by library type: % of holdings unique relative tooriginal G5 collection:
Large US metropolitan library: 39 percent (most unlike G5)Large US private university: 25 percentLarge Canadian university: 23 percentLarge US public university: 21 percentSmall US liberal arts college: 13 percent (most like G5)
“The Google 10”
OriginalGoogle 5
(10.5 million books)
Google 10 collection:12.3 million books
+ 1.8 million (17 %)
Google 10 collection:12.3 million books
+ 1.8 million (17 %)
Diminishing returns?
Original G5:~18 million holdings58% unique
New G5:~8 million holdings22% unique
Mass digitization programs and other aggregate collections increasingly common features of library landscape
Effective decision-making/planning aided by convergence on set of standard questions that help map out anatomy of aggregate collections
Example: mass digitization programs • What are characteristics of overarching population of materials that is
target of digitization effort? • How much of population will digitization effort cover?• What is potential degree of redundancy?• What bibliographic unit is focus of digitization (e.g., manifestations,
expressions, works)?• What number of participants and combination of institution types is
optimal for obtaining maximum benefit with minimum cost?
Anatomy of aggregate collections
Aggregate collections and WorldCat
WorldCat more than tool for catalogingand reference; also strategic resource formanaging aggregate collections
OCLC Group Services• http://www.oclc.org/groupservices/
OCLC WorldCat Collection Analysis Service• http://www.oclc.org/collectionanalysis/
OCLC Research data-mining activities• Web site: http://www.oclc.org/research/projects/mining/