17
For permission to copy, contact [email protected] 2005 Geological Society of America 61 Geosphere; October 2005; v. 1; no. 2; p. 61–77; doi: 10.1130/GES00011.1; 7 gures. A collaborative system for sharing paleontology collections data Kenneth G. Johnson 1 Harry F. Filkorn Mary Stecheson  Department of Invertebrate Paleontology, Natural History Museum of Los Angeles County, 900 Exposition Boulevard, Los  Ange les, California 90007, USA ABSTRACT Muse um collect ions provide pri mary data for paleontologists, and recent advanc- es in infor matio n techn ology have revol u- tioniz ed how mus eums col lec t and share this infor matio n. Howev er, many natural history museums have huge collections and sma ll bud get s, so mus eum sci ent ist s are challenged to keep these critical data cur- rent and available to the public. We suggest that est abli shi ng an open col laborati on through the Internet is one possible solution to this challenge. To achieve this solution, we have implemented a Web-based collec- tions catalog to enc ourage col laborative maintenance of collections data as a shared resource. Anyone can search the catalog via a simple interface designed for any stan- dard Web browser, and Web users can also be authorized to add information or update records as stratigraphic and taxonomi c con cep ts change . The goal is to est abl ish two-way communication between our cata- log and the scie ntic commun ity wherein the museum shares its colle ction s and re- lated data, and in ret urn the commu nity contributes new data acquired through use of the collections. The catalog also provides a basic function for building links with on- line publications and other data sources. As data excha nge standa rds become acce pted, the se lin ks can be use d to cre ate metada- tabases that could lead to global networks of collections, taxonomic, stratigraphic, and bibliographic information. By providing an efcient mechanism to locate and synthesize lar ge volumes of dis par ate inf ormati on, such loosely integrated systems have result- ed in rapid progress in disciplines of the life 1 Current Address: Department of Palaeontology, Natural History Museum, Cromwell Road, London SW7 5BD, UK. and physi cal scie nces, and they represen t one way forward into a data-r ich future for paleontology. Keywords: geoinformatics, paleontol ogy, collections. INTRODUCTION Fossil specimens are the best record of the occurrence of a particular organism at a spe- cic time and pl ace (Al lmo n and Pou lto n, 2000), so collections are the raw data of pa- leontolog y. Collections are required for sub- sequ ent resea rche rs to chec k and reint erpre t pre vious work, and the y are an imp ort ant source of new information that can be released by the arriv al of new techn olog ies and new research ques tion s. For example, collections have been used in studies based on morpho- metric analysis, molecular methods including DNA sequ enci ng, and vari ous geoc hemi cal tech niqu es (Sua rez and Tsutsu i, 2004 ; All - mon, 2005). Collections held by museums be- come esp eci all y imp ort ant in cases where original exposures are no longer available for collecting, as is commonly the case for man- made exposures produced during road build- ing, quarrying, or construction. However, col- lections of fossils are only useful if they are accessible to potential users. Traditional use of paleontol ogy collectio ns required researchers to visit museums and work with material on- site or resort to secondary sources in the pub- lished literature. In reality, much of the infor- mati on abo ut the con tent s of pale onto log y collections is passed along by word of mouth, as a kind of folklore: for example, Heinz Low- enstam was a professor at the California In- stitute of Technology, so his collections might be held by an institution in Southern Califor- nia. Obviously, this is not the most efcient method to advertise the availability of impor- tant research collections. For at least a decade, it has been clear that the World Wide Web is an ideal forum to publish collections catalogs. Besi des wide sprea d avai labi lity and ease of access, the Internet offers the additional ben- et of allowing databases to be integrated into new networks of bioinformatics and geoinfor- matics (Graham et al., 2004). Such networks enab le resea rche rs to addr ess que stions re- garding the large-scale history of regional or global diversity in response to global environ- ment al chan ge (e.g ., Jack son and John son, 2000; Alroy et al., 2001), and are an inevitable part of the future of paleontology. Most natural history coll ecti ons belo ng to public or nonprot institutions that hold their collections in the public trust (American As- sociation of Museums, 2005). However, many of these institutions have recently been subject to budget shortfalls (Dalton, 2003; Suarez and Tsutsui, 2004) that have reduced support for coll ecti ons. At the same time, chan ging or- gan iza tio nal pri ori ti es ha s res ult ed in the transfer of collections to a smaller number of inst itut ions (Gro pp, 2003 ). For example, the Dep artme nt of Inve rteb rate Paleo nto logy at the Natural History Museum of Los Angeles County (LACMIP) currently contains collec- tions that formerly belonged to the University of Southern California, the University of Cal- ifornia at Los Angeles, the California Institute of T echnolog y, and California State Univer- sity , Nort hrid ge. The conseque nce of thes e transfers is that relatively small staffs are car- ing for many large and important collections that are critical to the future of paleontology. Besi des limi tati ons in manp ower , ther e is an increasing shortage of expertise. With reduced staff, most institu tion s do not have in-house experts that can serve as taxonomic authorities in the entire spectrum of fossil groups repre- sented in their enor mous comb ined collec- tions. Without this expert knowledge in-house, it is difcult to adequately maintain and im-

Johnson Et Al._2005. a Collaborative System for Sharing

Embed Size (px)

Citation preview

Page 1: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 1/17

For permission to copy, contact [email protected]

2005 Geological Society of America 61

Geosphere; October 2005; v. 1; no. 2; p. 61–77; doi: 10.1130/GES00011.1; 7 figures.

A collaborative system for sharing paleontology collections data

Kenneth G. Johnson1

Harry F. Filkorn

Mary Stecheson Department of Invertebrate Paleontology, Natural History Museum of Los Angeles County, 900 Exposition Boulevard, Los

 Angeles, California 90007, USA

ABSTRACT

Museum collections provide primary

data for paleontologists, and recent advanc-

es in information technology have revolu-

tionized how museums collect and share

this information. However, many natural

history museums have huge collections and

small budgets, so museum scientists arechallenged to keep these critical data cur-

rent and available to the public. We suggest

that establishing an open collaboration

through the Internet is one possible solution

to this challenge. To achieve this solution,

we have implemented a Web-based collec-

tions catalog to encourage collaborative

maintenance of collections data as a shared

resource. Anyone can search the catalog via

a simple interface designed for any stan-

dard Web browser, and Web users can also

be authorized to add information or update

records as stratigraphic and taxonomic

concepts change. The goal is to establishtwo-way communication between our cata-

log and the scientific community wherein

the museum shares its collections and re-

lated data, and in return the community

contributes new data acquired through use

of the collections. The catalog also provides

a basic function for building links with on-

line publications and other data sources. As

data exchange standards become accepted,

these links can be used to create metada-

tabases that could lead to global networks

of collections, taxonomic, stratigraphic, and

bibliographic information. By providing an

efficient mechanism to locate and synthesizelarge volumes of disparate information,

such loosely integrated systems have result-

ed in rapid progress in disciplines of the life

1Current Address: Department of Palaeontology,Natural History Museum, Cromwell Road, LondonSW7 5BD, UK.

and physical sciences, and they represent

one way forward into a data-rich future for

paleontology.

Keywords: geoinformatics, paleontology,

collections.

INTRODUCTION

Fossil specimens are the best record of the

occurrence of a particular organism at a spe-

cific time and place (Allmon and Poulton,

2000), so collections are the raw data of pa-

leontology. Collections are required for sub-

sequent researchers to check and reinterpret

previous work, and they are an important

source of new information that can be released

by the arrival of new technologies and new

research questions. For example, collections

have been used in studies based on morpho-

metric analysis, molecular methods including

DNA sequencing, and various geochemical

techniques (Suarez and Tsutsui, 2004; All-mon, 2005). Collections held by museums be-

come especially important in cases where

original exposures are no longer available for

collecting, as is commonly the case for man-

made exposures produced during road build-

ing, quarrying, or construction. However, col-

lections of fossils are only useful if they are

accessible to potential users. Traditional use of 

paleontology collections required researchers

to visit museums and work with material on-

site or resort to secondary sources in the pub-

lished literature. In reality, much of the infor-

mation about the contents of paleontology

collections is passed along by word of mouth,as a kind of folklore: for example, Heinz Low-

enstam was a professor at the California In-

stitute of Technology, so his collections might

be held by an institution in Southern Califor-

nia. Obviously, this is not the most efficient

method to advertise the availability of impor-

tant research collections. For at least a decade,

it has been clear that the World Wide Web is

an ideal forum to publish collections catalogs.

Besides widespread availability and ease of 

access, the Internet offers the additional ben-

efit of allowing databases to be integrated into

new networks of bioinformatics and geoinfor-

matics (Graham et al., 2004). Such networks

enable researchers to address questions re-

garding the large-scale history of regional orglobal diversity in response to global environ-

mental change (e.g., Jackson and Johnson,

2000; Alroy et al., 2001), and are an inevitable

part of the future of paleontology.

Most natural history collections belong to

public or nonprofit institutions that hold their

collections in the public trust (American As-

sociation of Museums, 2005). However, many

of these institutions have recently been subject

to budget shortfalls (Dalton, 2003; Suarez and

Tsutsui, 2004) that have reduced support for

collections. At the same time, changing or-

ganizational priorities has resulted in the

transfer of collections to a smaller number of 

institutions (Gropp, 2003). For example, the

Department of Invertebrate Paleontology at

the Natural History Museum of Los Angeles

County (LACMIP) currently contains collec-

tions that formerly belonged to the University

of Southern California, the University of Cal-

ifornia at Los Angeles, the California Institute

of Technology, and California State Univer-

sity, Northridge. The consequence of these

transfers is that relatively small staffs are car-

ing for many large and important collections

that are critical to the future of paleontology.

Besides limitations in manpower, there is anincreasing shortage of expertise. With reduced

staff, most institutions do not have in-house

experts that can serve as taxonomic authorities

in the entire spectrum of fossil groups repre-

sented in their enormous combined collec-

tions. Without this expert knowledge in-house,

it is difficult to adequately maintain and im-

Page 2: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 2/17

62 Geosphere, October 2005

JOHNSON et al.

Figure 1. An example of a specimen lot from the Department of Invertebrate Paleontology at the Natural History Museum of Los

Angeles County (LACMIP) collections, including paper labels containing potentially useful information that should be incorporated into

the LACMIP specimen catalog.

prove collections without enlisting the support

of experts in the broader paleontological com-

munity. This outside assistance must come

from the researchers using museum collec-

tions to address questions in their own spe-

cialized fields, whether Cambrian trilobites of 

the Great Basin or Pleistocene mollusks of 

western North America. Collections managers

provide free access to specimens and data, but

sharing must become a two-way street. The

research community using these resources

must contribute its expertise to ensure contin-

ued access to high-quality information. Other

fields of research within bioinformatics are

reaching the same conclusion (Eiden, 2004;

Wilson, 2005). To help achieve this, we have

developed a Web-based collections catalog

that can be jointly managed by the museum

Page 3: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 3/17

Geosphere, October 2005 63

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 2. A schematic model illustrating the architecture underlying the Department of Invertebrate Paleontology at the Natural History

Museum of Los Angeles County (LACMIP) catalog. All information is stored in a relational database and is accessible through four

user interfaces. Web forms and REST-style Web services can be used to search, browse, and add information into the system. Software

underlying each component is indicated in parentheses, including Apache Web server, PostgreSQL database management system, and

SQL and PHP programming languages. Connections between Web server and clients on the World Wide Web can be encrypted using

the mod ssl module available with the Apache Web server software.

collections staff and research community as a

shared resource.The LACMIP holds more than five million

specimens, primarily from the western United

States, including the world’s largest collec-

tions of Cretaceous and Neogene mollusks

from western North America. Our collections

have been built over the past 90 yr and include

the important university collections mentioned

above that were transferred to the museum as

local universities decided to eliminate their re-

search collections. The department is currently

housed in an off-site facility about a half mile

from the main museum. This site contains col-

lections storage as well as laboratories and

staff offices. Within the collections space, the

fossils are stored in 674 steel cabinets. Spec-

imens collected from the same locality are

stored together, and the entire main collections

are arranged first according to geologic age

(Cambrian to Quaternary) and then by geo-

graphic place (country, state, county) within

each age. Each steel cabinet has a set of draw-

ers containing specimen lots. These are groups

of specimens from a single collecting locality

that have been identified as belonging to the

same taxon. Over the years, each lot may haveaccumulated a group of paper labels that con-

tains information regarding the fossil collect-

ing locality and sometimes multiple taxo-

nomic determinations made by different

researchers who have studied the material. For

example, the gastropod illustrated in Figure 1

has four different hand-written and typed la-

bels that contain such data. One of our chal-

lenges is to capture these data and make them

available to the public.

Cataloging of the collection was started in

the 1960s with the development of a card-

based locality register. In this system, each lo-

cality was given a unique number and a card

with essential geographic and stratigraphic in-

formation. These numbers were attached to

specimens and became the primary identifi-

cation of specimen lots in the collection. A

similar card file system was developed for

type and figured specimens. Each type speci-

men was associated with a unique number and

was cross-referenced with specimen identifi-

cation and bibliographic information. During

the late 1980s these data were entered by hand

into a custom collections management systemdeveloped in Borland Paradox. Nontype spec-

imens that had never been figured in publi-

cations were not cataloged. However, the card

system continued to be maintained in parallel

with the computer database and was consid-

ered the standard. In 2002 we extracted the

data from the legacy database and reformed it

into a new system.

THE LACMIP COLLECTIONS

CATALOG

Our goal was to build an electronic catalog

that could meet the following objectives: (1)

The catalog must allow the rapid acquisition

of basic taxonomic, stratigraphic, geographic,

and bibliographic information. The majority

of these data need to be entered manually by

part-time staff with little training, mainly vol-

unteers and work-study students, so we have

developed entry forms with pick lists to min-

imize typing of long and unfamiliar scientific

names; (2) The catalog must be accessible

Page 4: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 4/17

64 Geosphere, October 2005

JOHNSON et al.

Figure 3. Forms for searching for collecting localities in the Department of Invertebrate Paleontology at the Natural History Museum

of Los Angeles County (LACMIP) catalog. A: Search form allows users to specify values for various fields. Continued on next page.

from any computer connected to the Internet.

To achieve this, we decided to take advantage

of a Web architecture approach and theexisting mature technology developed for e-

commerce sites on the World Wide Web. This

decision was made both to streamline the de-

velopment process and to allow access for the

broad community of research scientists con-

tributing to the site as well as museum staff 

working in other locations; (3) The system

must be able to share information with outside

data networks in geoinformatics and other sys-

tems in our own institution. Therefore, we

used a multitiered application architecture to

facilitate this sharing; (4) The system must al-

low links to be made directly from online tax-

onomic publications to the type and figured

specimens in our collections. These are among

the most important materials in our collec-

tions, and we strive to maximize their expo-

sure for convenient use by the research com-

munity; and (5) Images of specimens,

collecting localities, and digital copies of field

notes, maps, and other resources must also be

available for remote use.

With these objectives in mind, we have de-

veloped a flexible, modular system that can be

adapted to changing technology because in-

dividual components can be added, modified,

or removed as necessary. This will allow the

system to be improved incrementally as new

technologies become available. For example,

the current system does not include collections

management functions so it cannot be used to

track loans, insurance values, or the physical

location of specimen lots. Our institutional

Office of the Registrar performs many of these

tasks, and we are building automated links

from their registration system to our collection

catalog. Our system also does not include a

sophisticated geographic information system

to allow mapping or geospatial analysis, norhave we attempted to track complex synony-

mies and changes in taxonomic practice. In-

stead we plan to take advantage of other tools

developed especially for these purposes. For

example, we would likely cede responsibility

for maintaining taxonomic information to oth-

er systems when distributed taxonomic dictio-

naries become available for fossils. The

LACMIP electronic catalog has been designed

to publish information regarding our collec-

tions only.

The new LACMIP system has been devel-oped as a Web-based, client-server database

with multiple interfaces (Fig. 2). The data are

stored in a relational database as a backend,

using the PostgreSQL database system

(PostgreSQL Global Development Group,

2005). Some of the basic business logic is im-

plemented on this server including checks for

referential integrity and triggers that enforce

data updates. At the moment there are four

interfaces to the data. The most simple is an

interface that communicates via the SQL da-

tabase programming language (Wikipedia,

2005) used for administration and mainte-

nance. Three interfaces written in the PHP

scripting language (PHP Group, 2005) run via

an Apache Web server (Apache Software

Foundation, 2005). Two of these interfaces are

Web forms that allow input, searching, and

browsing of the data using standard Web

browsers on any machine connected to the In-

ternet. One is composed of simple read-only

forms accessible to the public, and the second

interface includes data input forms and ac-

Page 5: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 5/17

Geosphere, October 2005 65

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 3. (Continued.) B: Results for a search for localities from Redding Formation in Shasta County include 88 localities. Note that

there can be multiple entries for each field, for example, the age of locality LACMIP 10726 has been refined from Cretaceous to Turonian

by Harry Filkorn in August 2004. Public view cannot be modified. Continued on next page.

cepts user authentication using secure proto-

cols. The third interface is a set of basic Web

services built under a Web architecture (Ja-

cobs, 2004) or ‘‘REST-like’’ philosophy

(Fielding, 2000) that allows integration with

other systems.

Data models for systematics collections

have been described in detail elsewhere (As-

sociation of Systematics Collections Commit-

tee on Computerization and Networking,

1992; Morris, 2000; Pullan et al., 2000; Ra-

guenaud et al., 2002), and further analysis is

not warranted here. Our underlying database

structure is loosely based on these other mod-

els. The goal was to keep the schema rela-

tively simple but to capture as much useful

information as possible. The subject areas in-

clude localities, taxonomy, lots, people, im-

ages, and a bibliography. One critical differ-

ence between our model and many other

systems is that we track multiple interpreta-

tions for most data fields. That is, data are

never deleted as new information is added.

This is in keeping with the fundamental par-

adigm of collections data as tools for online

collaboration. All additions are time stamped

and marked with the name of the person that

made the contribution. This allows researchers

to know who added the information and when

it was added. Therefore, anyone who is inter-

ested can track changes in the system.

Locality associated information includes

geographic, stratigraphic, and collection data.

Our use of locality is similar to the concept

of collecting event used in the ASC model

(Association of Systematics Collections Com-

Page 6: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 6/17

66 Geosphere, October 2005

JOHNSON et al.

Figure 3. (Continued.) C: In contrast, authorized users may add additional information using controls along left margin of form.Continued below.

Figure 3. (Continued.) D: Clicking the control for Unit results in a new form that can be used to add additional information regarding

stratigraphic units. This simple mechanism allows researchers to update the catalog as they use it from any computer connected to the

Internet.

Page 7: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 7/17

Geosphere, October 2005 67

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 4. A new collecting locality can be added using this Web form.

mittee on Computerization and Networking,

1992). In theory it would be possible to make

multiple collections from the same geographic

and stratigraphic context, but in practice many

repeated collections are not from precisely the

same context. Therefore, we consider each

new collection as a new locality in our system.

The collector, field number, and date of col-

lection are associated with the collecting lo-

cality in the LACMIP data model. Geographic

data are categorized as political place names

(city, county, state or province, country) and

supplemented by detailed written descriptions

provided by collectors. Geospatial data are in-

cluded where available and provided by the

collector (usually in the form of United States

township/range system or latitude/longitude),

but standardized georeferencing remains to be

completed. Stratigraphic information is limit-

ed to stratigraphic units (member, formation,

group) and associated age range. The chron-

ostratigraphic units used in the system are the

internationally accepted standard stage names

(Geological Society of America, 1999). Ad-

ditional information on stratigraphy and age

can be included in the text description for each

locality.

A specimen lot is a group of specimens

from a collecting locality that has been sorted

out and identified as belonging to a particular

species or higher taxon. In theory all speci-

mens identified as the same species from a

single locality would be contained in one lot,

but in practice there might be more than one

lot of this species because of specimen abun-

dance, limitations in container size, or special

use of individual specimens from a lot (illus-

tration, geochemical analysis, etc.). Informa-

tion associated with specimen lots includes

taxonomic determinations, the number of 

specimens in the lot, and whether the speci-

men has been cited in a published work. Dig-

Page 8: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 8/17

68 Geosphere, October 2005

JOHNSON et al.

Figure 5. A simple pick list mechanism can be used to select taxonomic names. A: For example, when adding a new lot, determination

is selected using a pick list. In this form, user is searching for the genus Chione. Continued on next page.

ital images of specimens are provided for

some specimen lots.

Managing taxonomic data is a complex

problem, and data models have been devel-

oped to track synonymies, changes in rank,

splitting, and the detailed consequences of 

changing taxonomic concepts and practice

(Taxonomic Databases Working Group, 2004;

Shattuck, 2005). The LACMIP catalog records

updates to determinations of specimen lots

and allows users to search for lots using su-

praspecific classification. We use a combina-

tion of our legacy database and data from the

United States Department of Agriculture In-

tegrated Taxonomic Information System

(ITIS) (ITIS, 2005) as the starting point for

mollusks and corals, and we could easily in-

tegrate other taxonomic dictionaries as they

become available for fossil groups. Multiple

determinations can be included for each spec-

imen lot, and we have implemented a basic

system for tracking synonyms to aid in the

consistent application of taxon names.

Although collecting localities and specimen

lots are the basic units of information in our

catalog, we also maintain information regard-

ing associated personnel, digital images, and

a bibliography relevant to the LACMIP col-

lections. These supplementary modules have

been kept simple. People associated with the

collections include collectors, collections

maintenance staff, authorized users of the cat-

alog, and specialists who have contributed

data to the system. A basic bibliographic table

that allows publications to be associated with

localities and specimen lots is also main-

tained. Most collection localities are associ-

ated with maps, and these are referenced as

publications in the bibliography. In our current

catalog, images are maintained as digital files

on a fileserver at two resolutions. Thumbnails

are small compressed files with widths of 150

pixels for photographs and 300 pixels for field

maps or other scanned images. High-resolution

images are also available to the public at

widths of 450 pixels for specimens and 800

pixels for scanned materials. Image file data

are maintained in a basic image database as-

sociated with our catalog so that they can be

published over the World Wide Web.

A WEB INTERFACE TO THE

COLLECTIONS CATALOG

There are both public and restricted Web

interfaces to the LACMIP collections catalog

(Fig. 3A–D). The public interfaces allow re-

searchers to browse the catalog over the World

Wide Web (Johnson et al., 2005a; Fig. 3B).

Note that we track multiple interpretations for

Page 9: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 9/17

Geosphere, October 2005 69

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 5. (Continued.) B: One genus is found by that search and can be selected by choosing the Yes control. Continued on next page.

most data fields. The name of the person who

made each entry and the date of entry are in-

dicated in parentheses. Researchers can

browse through a set of localities or specimen

lots or can download the information for local

use. Data can be downloaded as delimited text

files that include only the most up-to-date in-

formation, because the full information asso-

ciated with any particular locality cannot be

represented in a simple two-dimensional table

if multiple interpretations are present for any

piece of information. For printing specimen

lot labels or hard copies of locality informa-

tion, portable document format (PDF) files

can be downloaded. A thumbnail is shown if 

images are available, and higher-resolution

images can be viewed by clicking on the

thumbnail. The restricted Web forms can be

accessed using our secure Web server. Muse-

um collections staff and researchers interested

in contributing to the system are assigned user

names and passwords that are required to ac-

cess this part of the site. Restricted forms for

searching and browsing the catalog are similar

to the public pages except they allow input of 

additional data.

The initial entry of locality and lot records

into the catalog can only be performed by mu-

seum collections staff. There are data entry

forms for each of the main subject areas (Fig.

4), written as standard hypertext markup lan-

guage (HTML) Web forms. An online data en-

try guide is provided to ensure consistent data

input, and pick lists have been implemented

where possible to minimize typographical er-

rors. For example, when a determination is

made, there are several steps to selecting a

taxon name (Fig. 5A–D). Also, modern Web

browsers have autocomplete functions that

may reduce typographic errors. There is a sim-

ple mechanism to increase the consistent use

of taxonomic names via tracking synonyms.

Junior synonyms can be associated with senior

synonyms so that when a junior synonym is

requested as a taxonomic determination, both

that name and the senior synonym are re-

turned as determinations. In general, this

interface has been designed to minimize

potential data-entry errors because much in-

formation is hand keyed into the catalog by

assistants who may have limited geological or

taxonomic expertise. However, information is

not proofed and all data entered into the sys-

tem are immediately available to the research

community.

Both the public and authorized researchers

can search and browse the data using the pro-

vided set of Web forms. For example, to find

all localities in Shasta County from the Redd-

ing Formation (Fig. 3A), a researcher needs to

fill in the appropriate fields on the locality

search form. In this case, a total of 88 local-

ities is returned, and users may browse

through them one by one (Fig. 3B). Alter-

nately, a researcher could return to the search

form and limit or refine the search (using the

Page 10: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 10/17

70 Geosphere, October 2005

JOHNSON et al.

Figure 5. (Continued.) C: The pick list can then be used to select the appropriate subgenus and species within the genus. If a particular

taxon is not found in the system, it can be added by selecting entry for New in pick list. Continued on next page.

Modify Search control), or the researcher

could download the entire data set either as a

text file or a PDF-formatted file that is ready

to be printed. Authorized researchers see a

slightly different view (Fig. 3C), because they

are able to add information. The labels asso-

ciated with each line of data are now controls

that may be used to access additional forms

for data entry. For example, to add new in-

formation regarding the stratigraphic unit of a

locality, a contributor would click on the but-

ton marked Unit to use the appropriate form

(Fig. 3D).

Searching and browsing for specimen lots

is similar to working with locality data and

can be performed using a similar set of Web

forms (Fig. 6A–C). A search can be per-

formed for both lot information and the lo-

cality from which the lots were collected. For

example, a search for the gastropod genus

Paosia from the Redding Formation (Fig. 6A)

returns eight lots from a selection of localities

(Fig. 6B). Data for this list of lots can then be

downloaded as a text file by selecting the

Download Lot List, or labels for specimen

trays can be produced by selecting Create

Labels. Information for one of the lots (lot

LACMIP 10726-2) is shown in Figure 6C.

However, the downloaded data will not in-

clude all of the information associated with

this lot because this information cannot be or-

ganized into a simple two-dimensional table.

This lot has been identified several times, first

as Oonia? californica (Gabb, 1864), later as

Paosia colusaensis (Anderson, 1958), and

most recently as Paosia californica (Gabb,

1864). In addition, the specimen lot has been

cited in two publications (Jones et al., 1978;

Squires and Saul, 2004) as type specimen

LACMIP 10810. Several images are also

available that can be downloaded in high res-

olution. Information about locality LACMIP

10726 is at the bottom of the lot page, includ-

ing a map. This series of Web forms provides

the primary interface for the catalog. Similar

forms exist to search, browse, and add biblio-

graphic and biographic information. A com-

prehensive user guide that will assist research-

ers with use of the system, including standards

for data entry, is available through a link on

all of the forms.

As of May 2005, our entire locality register

of 27,970 collections has been included in the

catalog. To date 28,197 specimen lots have

been cataloged comprising 601,409 individual

specimens. We estimate that this includes

20% of our complete collection, but we do

not have precise estimates for the total size of 

the collection. In fact, during the cataloging

process we are finding that the previous at-

tempts to estimate collection size probably are

25%–30% lower than the true figure. A sim-

ilar result may be obtained during cataloging

of other large paleontological collections. The

majority of these records is derived from our

extensive holdings of Neogene Mollusca from

Page 11: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 11/17

Geosphere, October 2005 71

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 5. (Continued.) D: In this case Chione (Chionista) fluctifraga is selected as the determination for the new lot.

Southern California. Cataloging of this mate-

rial was determined to be a priority due to the

potential use for studies of the impact of re-

gional environmental change on shallow ma-

rine communities. In addition, our complete

set of type and figured specimen lots has been

incorporated, including 10,429 specimens.

These are the most important components of 

the collection, so they were a priority for

cataloging.

WEB SERVICES

There are several problems with the type of 

Web forms interface outlined above. Most se-

rious is the requirement for human intervention

to locate a particular piece of information re-

garding a particular locality or specimen lot.

This means that it is difficult to generate direct

links to information, for example to link from

another Web site to one particular locality. Sec-

ondly, the ‘‘Web spider’’ programs used by

standard Web search engines to index Web pag-

es cannot access Web forms easily. To over-

come these limitations, we have designed a

simple Web interface to the LACMIP catalog

that allows direct linking to individual locality,

specimen lot, type specimen, and digital image

records. We have followed a REST-like archi-

tecture (Fielding, 2000) that takes advantage of 

existing Web protocols to allow access to our

data from outside systems. Each data resource

is represented by a Web address or unique re-

source locator (URL). These addresses are stat-

ic and easy to construct if the user knows what

he or she is looking for. For example, the URL

http://ip.nhm.org/ipdatabase/locality/17575 will

return whatever information we have regarding

locality 17575, and the URL http://ip.nhm.

org/ipdatabase/lot/10762–2 will link directly to

information about specimen lot 10762–2. Sim-

ilar links exist for type specimens and images

of specimen lots. For example, the URL http:// 

ip.nhm.org/ipdatabase/type/9786 links directly

to type specimen LACMIP 9786. The returned

pages are not static Web pages but are gener-

ated by the Web server at each request so they

are always up to date. As standard schemas for

the publication of paleontological specimen

data become available, we will be able to pub-

lish extensible markup language (XML)–

formatted information using this mechanism.

As a test of this Web services interface, we

developed a system that allows joint queries

Page 12: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 12/17

72 Geosphere, October 2005

JOHNSON et al.

Figure 6. Specimen lot data can be browsed and new information can be added to the Department of Invertebrate Paleontology at the

Natural History Museum of Los Angeles County (LACMIP) collections catalog using Web forms. A: A search for the gastropod genus

Paosia from the Redding Formation is performed by entering Paosia and Redding in the appropriate fields. Continued on next page.

across both the Holocene and fossil mollusk 

collections at the Natural History Museum of 

Los Angeles County (LACM; Johnson et al.,

2005b). In our institution, most departments

use different systems that are appropriate for

the needs of each department. For example,

the LACM Holocene malacology database re-

quires no treatment of stratigraphy, and the in-

vertebrate paleontology database has no way

to track water depth. In the joint search tool,

searches are performed on a subset of fields

from the LACM malacology and LACMIP da-

tabases, and the results include links back into

the original databases so users can access

more complete information that might not be

contained in both systems (Fig. 7A–C).

DISCUSSION

Developing any information system re-

quires compromise. Our priority has been to

publish as much collections-related informa-

tion as possible with limited resources. The

quality of these data varies, but even imperfect

data can be useful (Lieberman and Kaesler,

2000). Furthermore, we acknowledge that as

museum curators we will never have the re-

sources to fully verify the immense volume of 

information held in our catalog. Instead, the

paleontological community must help with

this never-ending task. The LACMIP catalog

is a living document that is constantly being

improved by museum staff and database users,

and the information published in it should not

be used uncritically in large compilations. Al-

though effort is made to publish only accurate

information, there is large variation in the

quality of the information included in the cat-

alog. Stratigraphic and taxonomic concepts

change with time, and these updates are not

always included in the system. Indeed, we

Page 13: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 13/17

Geosphere, October 2005 73

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 6. (Continued.) B: This search returned eight lots from various collecting localities; selecting one of the buttons on the left of 

page will return more information for a particular lot. Continued on next page.

hope that users will help improve the data, and

we ask that the authors publishing studies

based on information in the LACMIP cataloghelp update the catalog with new information

and interpretations resulting from their re-

search. We also expect authors to include ci-

tations to the LACMP catalog if they have

used it as a data source.

Besides sharing information with our com-

munity of researchers, we encourage links to

the LACMIP catalog from online versions of 

publications that make use of our collections.

Such links should enhance greatly the utility

of research collections catalogs (National Re-

search Council, 2002). For example, papers

published in the online version of the Journal

of Paleontology or Geosphere could contain

direct links from specimen or locality citations

to the LACMIP catalog. Such links allow

readers rapid access to the most up-to-date in-

formation available. Changes in the interpre-

tation of stratigraphy, environment, or taxo-

nomic classification cannot be tracked in a

static document, but the static document can

provide links back to systems that can be

changed. Similar links could be incorporated

into compilations of paleontological occur-

rences based on published records, thus allow-

ing users direct access to the underlying dataand allowing database administrators to auto-

matically track revisions in data associated

with museum collections. In the current im-

plementation we have adopted a Web archi-

tecture approach rather than a more complex

Web services approach. The benefit of this

type of interface is that it can be implemented

right now—the protocols exist, and they are

simple to use. The only software required to

view the catalog is a standard Web browser.

Furthermore, as new data standards and mes-

saging protocols develop we will be able to

accommodate them into new versions of the

LACMIP collections catalog.

Probably the main obstacle to the wide-

spread adoption of community-based catalogs

is encouraging qualified researchers to con-

tribute hard-earned data to a collaborative sys-

tem. There are several potential models to rec-

tify this problem, some of which offer a

‘‘carrot,’’ and others that threaten a ‘‘stick.’’

For example, we could require some level of 

contribution as a condition for providing loans

of specimens or access to the collections cat-

alog, but so far we reject this approach be-

cause it might result in reduced collectionsuse. An alternative is to provide a mechanism

by which contributors could receive some

form of professional credit in the form of mea-

sures that could be added to curricula vitae or

management reports used in professional per-

formance reviews. To achieve this, we plan to

implement an electronic recorder or score-

board that lists the number and type of data

contributed by each member of the commu-

nity using the catalog. As links develop into

the catalog from online journals or other pub-

lications, this scoreboard could be used to

track usage of particular types of information,

and the resulting track record could be used

to weight contributions from individual re-

searchers in the same way that publications

are weighted based on the number of times

that they are cited in works of other authors.

However, in the end, researchers and other us-

ers of information in the LACMIP catalog

must take on part of the responsibility for

maintaining this shared resource. As a com-

munity, we all require high-quality informa-

Page 14: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 14/17

74 Geosphere, October 2005

JOHNSON et al.

Figure 6. (Continued.) C: For example, the complete record for lot 10726-2 includes taxonomic determination, type status, citation

information, and collecting locality details. In this case images of the specimen and a map of the collecting locality are available.

Page 15: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 15/17

Geosphere, October 2005 75

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 7. The Web services interface to the Department of Invertebrate Paleontology at the Natural History Museum of Los Angeles

County (LACMIP) catalog has been used to construct a joint search tool for the catalogs of the Department of Invertebrate Paleontology

and the Malacology Section (LACM) of the Natural History Museum of Los Angeles County. A: This search form allows researchers

to locate specimens from two different data sets. Continued on next page.

tion to place fossils in the proper taxonomic,

stratigraphic, and geologic context because the

scientific value of paleontological collections

lies as much in this context as in the fossils

themselves. Unfortunately, with the funding

levels currently available for the support of 

collections, museum staff will never be able

to maintain and update all of the information

for researchers and other users of the catalog.

The resulting bottleneck will impede progress

by limiting the availability of up-to-date in-

formation. To avoid this, the community of 

paleontologists must perform as much of the

required maintenance and updating as their

collections use dictates. These data must be-

come a shared resource maintained by all as

we move together into a data-rich future for

paleontology.

ACKNOWLEDGMENTS

Much of the data in the LACMIP collections cat-alog was entered by our predecessors including J.M.Alderson, L.T. Groves, G. Kennedy, P.G. Owen, andE.C. Wilson. Our team of work study students, re-search associates, and volunteers includes M. Alon-so, J. Cline, S. Cowles, A. Fu, B. Gillies, L. Moore,H. Murdock, L.R. Saul, J. Severe, R.J. Stanton Jr.,and J. Wiggins. We thank C.M. Kelly for producingmany of the photographs. W. Allmon, W. Kiessling,D. Pentcheff, A. Valdes, and R. Wetzer provideduseful suggestions for improving this contribution.We gratefully acknowledge the support of the Unit-ed States National Science Foundation (grant DBI-0237337).

REFERENCES CITED

Allmon, W.D., 2005, The importance of museum collec-

tions in paleontology: Paleobiology, v. 31, p. 1–5.

Allmon, W.D., and Poulton, T.P., 2000, The value of fossil

collections, in White, R.D. and Allmon, W.D., eds.,

Guidelines for the management and curation of inver-

tebrate fossil collections: Paleontological Society Spe-cial Publication 10, p. 5–24.

Alroy, J., and 24 others, 2001, Effects of sampling stan-

dardization on estimates of Phanerozoic marine diver-

sification, Proceedings of the National Academy of 

Sciences, v. 98, p. 6261–6266.

American Association of Museums, 2005, Code of ethics for

museums: http://www.aam-us.org/museumresources/ 

ethics/coe.cfm (May 2005).

Anderson, F.M., 1958, Upper Cretaceous of the Pacific coast:

Geological Society of America Memoir 71, 378 p.

Apache Software Foundation, 2005, Apache http server,

v. 1.3.33: http://www.apache.org (February 2005).

Association of Systematics Collections Committee on Com-

Page 16: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 16/17

76 Geosphere, October 2005

JOHNSON et al.

Figure 7. (Continued.) B: A search for genus Terebralia in both LACM and LACMIP collections results in a total of six lots, including

four from fossil localities and two sites from the Holocene. Continued on next page.

puterization and Networking, 1992, An information mod-el for biological collections: http://www.nscalliance.org/ 

bioinformatics/asc%20model/Ascmodrpt.pdf (December

2004).

Dalton, R., 2003, Natural history collections in crisis as

funding is slashed: Nature, v. 423, p. 575.

Eiden, L.E., 2004, A two-way bioinformatic street: Science,

v. 306, p. 1437, doi: 10.1126/science.1107196.

Fielding, R.T., 2000, Architectural styles and the design of 

network-based software architectures [Ph.D. thesis]: Ir-

vine, University of California, http://www1.ics.uci.edu/ 

%7Efielding/pubs/dissertation/top.htm (October 2004).Gabb, W.M., 1864, Description of the Cretaceous fossils:

Palaeontology, v. 1, p. 55–236.

Geological Society of America, 1999, 1999 geologic time

scale: http://www.geosociety.org/science/timescale/ 

timescl.htm (January 2005).

Graham, C.H., Ferrier, S., Huettman, F., Moritz, C., and

Peterson, A.T., 2004, New developments in museum-

based informatics and applications in biodiversity

analysis: Trends in ecology and evolution, v. 19,

p. 497–503, doi: 10.1016/j.tree.2004.07.006.

Gropp, R.E., 2003, Are university natural science collec-

tions going extinct?: Bioscience, v. 53, p. 550.Integrated Taxonomic Information System, 2005, Integrated

Taxonomic Information System: http://www.itis.usda.

gov (February 2005).Jackson, J.B.C., and Johnson, K.G., 2000, Life in the last

few million years, in Erwin, D.H., and Wing, S.L.,

eds., Deep time: Paleobiology’s perspective: Paleobi-

ology, supplement to v. 26, p. 221–235.

Jacobs, I., ed., 2004, Architecture of the World Wide Web,

volume one: http://www.w3.org/TR/2004/REC-

Webarch-20041215 (December 2004).

Johnson, K.G., Filkorn, H.F., and Stecheson, M., 2005a,

Collections catalog of the Department of Invertebrate

Paleontology, Natural History Museum of Los An-geles County: http://ip.nhm.org (February 2005).

Johnson, K.G., Valdes, A., and Groves, L.T., 2005b, Extinct

and extant molluscs in the collections of the Natural

History Museum of Los Angeles County: http:// 

ip.nhm.org/nhmsearch/findlots.php (May 2005).

Jones, D.L., Sliter, W.V., and Popenoe, W.P., 1978, Mid-

Cretaceous (Albian to Turonian) biostratigraphy of 

northern California: Annales de Museum l’Histoire

Naturelle de Nice, v. 4, p. xxii.1–xxii.13.

Lieberman, B.S., and Kaesler, R.L., 2000, The scientific

value of natural history museum collections, in White,

R.D. and Allmon, W.D., eds., Guidelines for the man-

agement and curation of invertebrate fossil collections:

Paleontological Society Special Publication 10,

p. 109–117.

Morris, P.J., 2000, A model for invertebrate paleontologycollections information, in White, R.D. and Allmon,

W.D., eds., Guidelines for the management and cura-

tion of invertebrate fossil collections: Paleontological

Society Special Publication 10, p. 155–260.

National Research Council, 2002, Geoscience data and col-

lections: National resources in peril: National Acade-

my of Sciences, 128 p.

PHP Group, 2005, PHP version 4.3.10: http://www.php.net

(February 2005).PostgreSQL Global Development Group, 2005, Postgre-

SQL, version 8.0: http://www.postgresql.org (Febru-

ary 2005).

Pullan, M.R., Watson, M.F., Kennedy, J.B., Raguenaud, C.,

and Hyam, R., 2000, The Prometheus Taxonomic

Model: A practical approach to representing multiple

classifications: Taxon, v. 49, p. 55–75.

Raguenaud, C., Pullan, M.R., Watson, M.F., Kennedy, J.B.,

Newman, M.F., and Barclay, P.J., 2002, Implementa-

tion of the Prometheus Taxonomic Model: A compar-

ison of database models and query languages and an

introduction to the Prometheus Object-Oriented Mod-

el: Taxon, v. 51, p. 131–142.

Page 17: Johnson Et Al._2005. a Collaborative System for Sharing

8/6/2019 Johnson Et Al._2005. a Collaborative System for Sharing

http://slidepdf.com/reader/full/johnson-et-al2005-a-collaborative-system-for-sharing 17/17

SHARING PALEONTOLOGY COLLECTIONS DATA

Figure 7. (Continued.) C: Links are available on the page along with results that allow researchers convenient access to the additional

information in the LACMIP catalog. For example, the full information for specimen lot 26814-1 is available from http://ip.nhm.org/ 

ipdatabase/lot/26814-1.

Shattuck, S., 2005, Biolink, version 2.0: http://www.ento.

csiro.au/biolink/index.html (January 2005).

Squires, R.L., and Saul, L.R., 2004, The pseudomelaniid

gastropod Paosia from the marine Cretaceous of the

Pacific slope of North America and a review of the

age and paleobiogeography of the genus: Journal of 

Paleontology, v. 78, p. 484–500.

Suarez, A.V., and Tsutsui, N.D., 2004, The value of mu-

seum collections for research and society: Bioscience,

v. 54, p. 66–74.

Taxonomic Databases Working Group, 2004, International

Working Group on Taxonomic Databases: http:// 

www.tdwg.org (November 2004).

Wikipedia, 2005, SQL, in Wikipedia, the free encyclopedia:

http://en.wikipedia.org/wiki/SQL (April 2005).

Wilson, E.O., 2005, Systematics and the future of biology:

Proceedings of the National Academy of Sciences of the United States of America, v. 102, p. 6520–6521,doi: 10.1073/pnas.0501936102.

MANUSCRIPT RECEIVED BY THE SOCIETY 18 FEBRUARY 2005REVISED MANUSCRIPT RECEIVED 10 MAY 2005MANUSCRIPT ACCEPTED 17 MAY 2005

Printed in the USA