13
NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004

NATIONAL LIBRARY OF MEDICINE PubMed Central Edwin Sequeira National Library of Medicine May 26, 2004

Embed Size (px)

Citation preview

NATIONAL LIBRARY OF MEDICINE

PubMed Central

Edwin Sequeira

National Library of Medicine

May 26, 2004

NATIONAL LIBRARY OF MEDICINE

What is PubMed Central?

• Digital archive of life sciences journals• includes health policy, bioinformatics and other fields

• Participation is open to journals:• covered by a major abstracting/indexing service

• or, that have 3 editorial board members with current grants from major non-profit funding agencies

• Free access to full-text articles and supporting data

• Integrated with PubMed and other bibliographic and factual databases in NCBI’s Entrez network

NATIONAL LIBRARY OF MEDICINE

PMC Basic Policy

• Journal deposits an authoritative electronic copy that meets PMC data quality standards• full-text XML• original high-resolution graphics• PDF• supplementary data

• Journal may delay free access to its content• research articles are generally free in a year or less

• Copyright is retained by publisher or author

• Deposits – and free access permissions – are permanent• journal may stop depositing new material but may not withdraw

material already deposited

NATIONAL LIBRARY OF MEDICINE

PMC Archiving Model

• Multiple copies of archive on DVD and tape• Catalog database tracks what’s where

Journal files: SGML or XML in Publisher’s DTD;

Images, PDFs, Supplementary data files

Convert SGML to PMC XML common format

Convert images to Web display format

High resolution image files

Supplementary data files

PDFs PMC XML files (common DTD)

Web display images

Source SGML/XML files

PMC Public Access Database

PMC Archive

PMC Search results, TOC pages,

Full text pages, and others

Create online display pages dynamically from PMC database

NATIONAL LIBRARY OF MEDICINE

Why???

Why XML?• Preserves structure of an article

• Lends itself to intelligent processing • citation matching, selective searching, etc.

• Human readable – not dependent on technology

• Portable

Why Free?• Readers provide another level of quality control

• The more eyes the better

NATIONAL LIBRARY OF MEDICINE

Digital Journal Archiving Issues

• Ensuring quality of source materials

• Active use to ensure effective preservation

• Distribution of content to collaborating archives for added security• standard agreement covering rights and responsibilities of

archiving organization

• Basic toolset for archive duplication / exchange:• common interchange DTD

• standard file names

• unique object and accession IDs

• possibly, core software for loading content to database and displaying it online

NATIONAL LIBRARY OF MEDICINE

Genesis of the NLM Journal DTDs

In the beginning (Jan 2000) … custom handling of each journal (with different DTDs)

Within months … we need a common DTD… enter the PMC DTD – keep it simple… a simple DTD that accommodates a growing variety of incoming DTDs? Really?

Summer / Fall 2001 … we completely redesign and expand the PMC DTD

Early 2002 … Harvard / Mellon says “can we share?”Early 2003 … we have the NLM modular DTD suite

… and we’ve learned that an Archiving DTD should not be a Publishing DTD

NATIONAL LIBRARY OF MEDICINE

NLM Journal DTDs

• Journal Archiving and Interchange XML DTD• common format for storing and distributing content

supplied in a variety of “source” DTDs

• developed in cooperation with Mellon Foundation E-journal archiving program

• Journal Publishing XML DTD for original tagging of content at source

• Adopted by High Wire Press, JSTOR and many others

• Technical advisory group includes American Physical Society, High Wire Press, JSTOR, Microsoft

NATIONAL LIBRARY OF MEDICINE

What To Archive?

“…you don't know what you've got

Till it's gone”

– Joni Mitchell

NATIONAL LIBRARY OF MEDICINE

What the World Needs Now

• Journal production – authoring and copy-editing – using XML-based tools• published article comes from the XML, not vice versa

• Straightforward, universal standard for defining ownership and access rights, similar to copyright indication• evolving flavors of Open Access

• changes of ownership

• Other operational, free archives that can form a collaborative archiving network

NATIONAL LIBRARY OF MEDICINE

Back Issue Digitization

• Create a complete digital archive of PMC journals for today’s “if not online, it doesn’t exist” user

• Cover-to-cover digital copy of everything up to where journal began producing electronic copy

• Publisher gets free, unencumbered copy

• First complete archive, Bulletin of the Medical Library Association (1911), released in November 2003

• Expected collaboration with Wellcome Trust and UK Joint Information Systems Committee (JISC)

NATIONAL LIBRARY OF MEDICINE

Digitized Samples

NATIONAL LIBRARY OF MEDICINE

Find Out More

PubMed Central homehttp://www.pubmedcentral.gov/

NLM Journal XML DTDshttp://dtd.nlm.nih.gov/