Upload
roger-mcbride
View
216
Download
1
Embed Size (px)
Citation preview
Digital Archives at the Digital Archives at the National Library of National Library of
MedicineMedicineA presentation at the MLA SessionA presentation at the MLA Session
Lighting the Path: Digital Repositories in the Real Lighting the Path: Digital Repositories in the Real WorldWorld
May 24, 2004May 24, 2004by Diane Boehrby Diane Boehr
Cataloging Unit Head, National Library of Medicine, Cataloging Unit Head, National Library of Medicine, National Institutes of Health, National Institutes of Health,
Health & Human ServicesHealth & Human [email protected]@mail.nlm.nih.gov
ScopeScope
Historical medical worksHistorical medical works The NLM ArchiveThe NLM Archive PubMed CentralPubMed Central
Considerations as you begin a Considerations as you begin a projectproject
It will take much longer than you It will take much longer than you anticipateanticipate
You will learn a great deal about topics You will learn a great deal about topics outside your normal work dutiesoutside your normal work duties
Be willing to take baby steps and make Be willing to take baby steps and make a starta start
It is very rewarding to see the fruits of It is very rewarding to see the fruits of your laboryour labor
HMD ProjectsHMD Projects
Historical AnatomiesHistorical Anatomies Medicine in the AmericasMedicine in the Americas
Historical AnatomiesHistorical Anatomies
http://www.nlm.nih.gov/exhibition/http://www.nlm.nih.gov/exhibition/historicalanatomies/home.html historicalanatomies/home.html
Provides high-resolution downloadable Provides high-resolution downloadable scans of selected important images from scans of selected important images from illustrated anatomical atlases dating from illustrated anatomical atlases dating from the 15th to the 20th century the 15th to the 20th century
Titles and images selected by Michael Titles and images selected by Michael North, Head of Rare Books and Early North, Head of Rare Books and Early ManuscriptsManuscripts
Historical AnatomiesHistorical Anatomies
Consists of large JPEGs andConsists of large JPEGs and zoomable zoomable digitized images from the books and digitized images from the books and a brief bibliographical and historical a brief bibliographical and historical introduction to each title introduction to each title
Technical detailsTechnical details
The imaging for this project is contracted The imaging for this project is contracted outout
The contractor makes archival quality TIFF The contractor makes archival quality TIFF files (800 ppi resolution) and from that, files (800 ppi resolution) and from that, thumbnail and JPEG images are made for thumbnail and JPEG images are made for the site, using Adobe Photoshopthe site, using Adobe Photoshop
Zoomifyer Pro is used to create the pan Zoomifyer Pro is used to create the pan and zoom imagesand zoom images
The TIFF files are backed up on CD-ROMsThe TIFF files are backed up on CD-ROMs
Search and retrievalSearch and retrieval
Individual images do not have any Individual images do not have any metadata associated with them at this metadata associated with them at this time time
Bibliographic citations on the site match Bibliographic citations on the site match the LocatorPlus recordsthe LocatorPlus records
As the focus of the site is selected As the focus of the site is selected individual images from the books, rather individual images from the books, rather than the entire text, there are currently no than the entire text, there are currently no links from the LocatorPlus records for the links from the LocatorPlus records for the individual titles to images on the Web site individual titles to images on the Web site
Sample screenSample screen
Medicine in the AmericasMedicine in the Americas
Monographic original source Monographic original source materials on the development of materials on the development of medicine in New World published medicine in New World published prior to 1914 are being digitized in prior to 1914 are being digitized in their entirety their entirety
(http://www.ncbi.nlm.nih.gov/(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books)entrez/query.fcgi?db=Books)
Technical detailsTechnical details
Digitizing is being done in-house Digitizing is being done in-house Books are scanned, and from the initial scan Books are scanned, and from the initial scan
a photocopy and a TIFF file are createda photocopy and a TIFF file are created Photocopies are scanned to create OCR Word Photocopies are scanned to create OCR Word
text files, which are then manually reviewed text files, which are then manually reviewed and cleaned up to create a searchable, and cleaned up to create a searchable, downloadable PDF textdownloadable PDF text in modern font in modern font
TIFF file is used to create the typeface and TIFF file is used to create the typeface and layout of the original published worklayout of the original published work
Technical detailsTechnical details
Mounting of these texts on the Web and Mounting of these texts on the Web and the XML coding of the Word files done the XML coding of the Word files done using the NLM Bookshelf platform using the NLM Bookshelf platform
Bookshelf developed by NCBI for medical Bookshelf developed by NCBI for medical texts supplied by publishers in SGML, or texts supplied by publishers in SGML, or other desktop publishing formatsother desktop publishing formats
Platform has an existing template that Platform has an existing template that allows the record creators to easily input allows the record creators to easily input metadata without needing to know XML metadata without needing to know XML
Search and RetrievalSearch and Retrieval
Bookshelf site only supports keyword Bookshelf site only supports keyword searching searching
Standard bibliographic data from Standard bibliographic data from LocatorPlus and brief historical data LocatorPlus and brief historical data is included with the text is included with the text Catalog records have hot links to the Catalog records have hot links to the
Bookshelf siteBookshelf site
TimeframesTimeframes
Both projects went from planning to Both projects went from planning to implementation in about one year, implementation in about one year, although both projects will be adding although both projects will be adding more material to their sitesmore material to their sites
Use of standard, off the shelf Use of standard, off the shelf products or existing technologies products or existing technologies made implementation easiermade implementation easier
NLM ArchivesNLM Archives
A site to store material of permanent A site to store material of permanent value that has been published on the value that has been published on the NLM Web site, but is now outdated or NLM Web site, but is now outdated or supersededsuperseded
Searchable, yet clearly distinguished Searchable, yet clearly distinguished from current materialfrom current material
What do we mean by What do we mean by permanent?permanent?
Three aspects to permanence were Three aspects to permanence were identified:identified: 1) Identifier validity: The extent to which 1) Identifier validity: The extent to which
the given name or identifier will always the given name or identifier will always provide access to the same resourceprovide access to the same resource
2) Resource availability: The extent to 2) Resource availability: The extent to which a given resource is guaranteed to which a given resource is guaranteed to remain available in electronic formremain available in electronic form
3) Content invariability: The extent to which 3) Content invariability: The extent to which the content of the resource could change the content of the resource could change
NLM Permanence RatingsNLM Permanence Ratings
Four categories of permanence have Four categories of permanence have been defined:been defined: 1) Permanent, unchanging content: 1) Permanent, unchanging content:
NLM has made a commitment to keep NLM has made a commitment to keep this resource permanently available. Its this resource permanently available. Its identifier will always provide access to identifier will always provide access to the resource. Its content will not the resource. Its content will not change. change.
NLM Permanence RatingsNLM Permanence Ratings
2) Permanent, stable content: NLM has 2) Permanent, stable content: NLM has made a commitment to keep this made a commitment to keep this resource permanently available. Its resource permanently available. Its identifier will always provide access to identifier will always provide access to the resource. Its content is subject only the resource. Its content is subject only to minor corrections or additions.to minor corrections or additions.
NLM Permanence RatingsNLM Permanence Ratings
3) Permanent, dynamic content: NLM has 3) Permanent, dynamic content: NLM has made a commitment to keep this resource made a commitment to keep this resource permanently available. Its identifier will permanently available. Its identifier will always provide access to the resource. Its always provide access to the resource. Its content could be revised, replaced. content could be revised, replaced.
NLM Permanence RatingsNLM Permanence Ratings
4) Permanence not guaranteed: 4) Permanence not guaranteed: NLM has made no commitment to NLM has made no commitment to retain this resource. It could become retain this resource. It could become unavailable at any time. Its identifier unavailable at any time. Its identifier could be changed. could be changed.
WorkflowsWorkflows
Permanence ratings are assigned when a Permanence ratings are assigned when a resource is promoted to the NLM Web resource is promoted to the NLM Web sitesite
Default permanence ratings are Default permanence ratings are generated based on the category to generated based on the category to which the resource belongs which the resource belongs
Resource creators use a template which Resource creators use a template which adds basic metadata, in addition to the adds basic metadata, in addition to the category and permanence rating category and permanence rating
TemplatesTemplates
Metadata input template is a feature Metadata input template is a feature of TeamSite, our Web content of TeamSite, our Web content management softwaremanagement software
No knowledge of HTML is needed to No knowledge of HTML is needed to use these templatesuse these templates
Minimal set of required fields, with Minimal set of required fields, with default values or drop-down menus default values or drop-down menus supplied wherever possiblesupplied wherever possible
Required metadataRequired metadata
1) Title 7) Rights
2) Heading 8) Contact e-mail
3) Date first published 9) Language
4) Date last modified 10) Document category
5) Next scheduled review date
11) Permanence level
6) Publisher 12) URL
The NLM metadata set is based on The NLM metadata set is based on Dublin Core, with some local Dublin Core, with some local adaptationsadaptations
The full scheme may be seen atThe full scheme may be seen at http://www.nlm.nih.gov/tsd/cataloging/http://www.nlm.nih.gov/tsd/cataloging/
metafilenew.htmlmetafilenew.html
WorkflowsWorkflows
Every resource has the minimal metadata Every resource has the minimal metadata assigned by the resource creatorassigned by the resource creator
Permanent resources are routed to the Permanent resources are routed to the Cataloging Section Cataloging Section Complete MARC bibliographic records are createdComplete MARC bibliographic records are created Includes standardized access points, including Includes standardized access points, including
MeSH and an NLM classification numberMeSH and an NLM classification number Accessible in LocatorPlusAccessible in LocatorPlus Distributed to the utilities and other NLM Distributed to the utilities and other NLM
licensees. licensees.
WorkflowsWorkflows
The enhanced metadata created in The enhanced metadata created in Cataloging is then added back to the Cataloging is then added back to the header information of the online header information of the online resource resource
Preliminary metadata and the Preliminary metadata and the enhanced versions can be seen by enhanced versions can be seen by clicking on "View source"clicking on "View source"
Basic metadataBasic metadata
Enhanced metadataEnhanced metadata
Archive DesignArchive Design
Separate, distinct, but integral part Separate, distinct, but integral part of the NLM Web site of the NLM Web site
Searchable with standard NLM Searchable with standard NLM search software: Mindserver from search software: Mindserver from RecommindRecommind
Archive contentsArchive contents
Out-of-date resources--older material Out-of-date resources--older material that was once up on the site, but is that was once up on the site, but is no longer of current interestno longer of current interest
Earlier versions of current documents Earlier versions of current documents that have undergone major revisions that have undergone major revisions
Still to comeStill to come
Archiving non-HTML files, such as Archiving non-HTML files, such as PDF, video and audio clips, etc. PDF, video and audio clips, etc.
Archiving resources from areas in the Archiving resources from areas in the library which do not get promoted library which do not get promoted through TeamSitethrough TeamSite
Impact on CatalogingImpact on Cataloging
PubMed Central (PMC)PubMed Central (PMC) A bibliographic record must exist in the NLM A bibliographic record must exist in the NLM
catalog before a journal is added to PMCcatalog before a journal is added to PMC Records must be created if the title is not Records must be created if the title is not
already in the catalogalready in the catalog Downloaded from OCLCDownloaded from OCLC Skeletal record created from local templateSkeletal record created from local template High-priority, 24 hr. turnaround timeHigh-priority, 24 hr. turnaround time
Records are then fully catalogedRecords are then fully cataloged
Impact on CatalogingImpact on Cataloging
PMCPMC If the title is already in the catalog, If the title is already in the catalog,
holdings must be updatedholdings must be updated Indicate the title is available in PMCIndicate the title is available in PMC Range of issuesRange of issues Any embargo periodsAny embargo periods
Impact on CatalogingImpact on Cataloging
NLM ArchiveNLM Archive Cataloger creates core level MARC records for Cataloger creates core level MARC records for
any new resource on the NLM Web site rated any new resource on the NLM Web site rated PermanentPermanent
View the site, as well as utilize metadata supplied by View the site, as well as utilize metadata supplied by record creator for descriptive datarecord creator for descriptive data
Supply MeSH and NLM classificationSupply MeSH and NLM classification Establish authorized name headings in the national Establish authorized name headings in the national
authority fileauthority file Transfer this enhanced metadata back to the Transfer this enhanced metadata back to the
resource resource
Impact on CatalogingImpact on Cataloging
HMD projectsHMD projects Minimal impact on CatalogingMinimal impact on Cataloging
Books being digitized already have records Books being digitized already have records in the catalogin the catalog
HMD has its own cataloging staff who can HMD has its own cataloging staff who can make links between existing catalog records make links between existing catalog records and digitized materialand digitized material
Impact on CatalogingImpact on Cataloging
Despite the increased workload, we Despite the increased workload, we think archiving projects are think archiving projects are enhanced when catalogers are enhanced when catalogers are involved in the projectsinvolved in the projects
Catalogers increase their knowledge Catalogers increase their knowledge by becoming involved in these by becoming involved in these projects projects