Emerging Standards for Emerging Standards for Libraries and PublishersLibraries and Publishers
Cliff Morgan, John Wiley & Cliff Morgan, John Wiley & Sons LtdSons LtdUKSG briefing session, 15-17 April UKSG briefing session, 15-17 April 20022002
What I’ll be coveringWhat I’ll be covering
IdentifiersIdentifiers MetadataMetadata E-booksE-books
What I won’t be coveringWhat I won’t be covering
Graphics (e.g. Graphics (e.g. JPEG, GIF, PNG, SVGJPEG, GIF, PNG, SVG)) Character sets (Character sets (ASCII, UnicodeASCII, Unicode)) Relationship models (Relationship models (RDF, Topic Maps/XTMRDF, Topic Maps/XTM)) E-commerce (E-commerce (UN/EDIFACT, XML-edi, ebXMLUN/EDIFACT, XML-edi, ebXML)) XML stuff (XML stuff (Schemas, Xlink, XSL, XSLTSchemas, Xlink, XSL, XSLT, etc.), etc.) Usage stats standards (e.g. Usage stats standards (e.g. COUNTER, COUNTER,
ANSI/NISO Z39.7-1995) Rights metadata (XrML, ODRL)
IdentifiersIdentifiers
ISSNISSN ISBNISBN SICISICI BICIBICI PIIPII DOIDOI ISTCISTC Multimedia identifiersMultimedia identifiers
ISBNISBN
International Standard Book NumberInternational Standard Book Number ISO 2108ISO 2108 e.g. e.g. 0-471-92755-40-471-92755-4 Geog location/language - Geog location/language -
publisher/imprint - title (print publisher/imprint - title (print format) - check characterformat) - check character
Has been a standard for > 30 yearsHas been a standard for > 30 years
New ISBNNew ISBN
ISBN is being revised - 13 digits from 1/1/05ISBN is being revised - 13 digits from 1/1/05 Can double capacity by giving a 979 prefixCan double capacity by giving a 979 prefix Issues:Issues: - -
hexadecimal or decimal?hexadecimal or decimal? - limit - limit ISBN to print - do something else ISBN to print - do something else for for electronic? versions? formats?electronic? versions? formats? - assign to - assign to components (e.g. chaps)?components (e.g. chaps)? - should - should number be completely dumb?number be completely dumb? - metadata - metadata deposit at assignment?deposit at assignment?
ISSNISSN
International Standard Serial NumberInternational Standard Serial Number ISO 3297ISO 3297 e.g. e.g. 0749-503X0749-503X If publisher has not applied for an ISSN, If publisher has not applied for an ISSN,
any 3rd party can applyany 3rd party can apply for their own for their own data management needsdata management needs
Different media get different ISSNs, e.g. Different media get different ISSNs, e.g. print ISSNprint ISSN is different from CD-ROM ISSNis different from CD-ROM ISSN
But different file formats don’t get But different file formats don’t get different ISSNs, so offline is different different ISSNs, so offline is different from online, but PDF is same as HTMLfrom online, but PDF is same as HTML
If online contains only abstracts of print If online contains only abstracts of print full text, no new ISSN for e-versionfull text, no new ISSN for e-version
If use print and eISSNs, must change If use print and eISSNs, must change both if title changesboth if title changes
http://www.issn.org:8080/English/pub/getting-http://www.issn.org:8080/English/pub/getting-checkingchecking
SICISICI
Serial Item and Contribution IdentifierSerial Item and Contribution Identifier ANSI/NISO Z39.56-1996 - reaffirmedANSI/NISO Z39.56-1996 - reaffirmed e.g.e.g. issueissue==0749-0749-
503X(20010115)18:1<>1.0.TX;2-X503X(20010115)18:1<>1.0.TX;2-X Art. = Art. = 0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X0749-503X(20010115)18:1<1:YGPIWG>2.0.TX;2-X
(Check digits in above examples have not been calculated.)(Check digits in above examples have not been calculated.)
Well used at issue level - bar codesWell used at issue level - bar codes Less used at article level Less used at article level
SICIs at Article LevelSICIs at Article Level
Requires publication info - but publishers Requires publication info - but publishers want to assign article Ids before pubwant to assign article Ids before pubnn
Long-windedLong-winded
Unfortunate syntax for Internet transfer (Unfortunate syntax for Internet transfer (<><>, , ##) - needs SGML entifying and hex encoding) - needs SGML entifying and hex encoding
Unclear what to do with special characters in Unclear what to do with special characters in Title CodeTitle Code
Not unique ID if two untitled articles on same Not unique ID if two untitled articles on same page (e.g. Letters)page (e.g. Letters)
C = Contribution, not C = Contribution, not ComponentComponent
SICI allows identification of article, issue SICI allows identification of article, issue ToC, issue Index and article abstract ToC, issue Index and article abstract (DPIs of 0, 1, 2, 3 respectively)(DPIs of 0, 1, 2, 3 respectively)
No way of using SICI to identify any No way of using SICI to identify any other component (such as Figure, Table, other component (such as Figure, Table, Section)Section)
Not surprising since it’s a Not surprising since it’s a canonicalisation nightmarecanonicalisation nightmare
http://sunsite.berkeley.edu/SICI/version2.htmlhttp://sunsite.berkeley.edu/SICI/version2.html
BICIBICI
Book Item and Component IdentifierBook Item and Component Identifier ISO DSFTU (Draft Standard for Trial Use)ISO DSFTU (Draft Standard for Trial Use) e.g. e.g.
0387119787(1982)<174:ADTATO>2.2.TX;1-Q0387119787(1982)<174:ADTATO>2.2.TX;1-Q ISBN, date, location, title, component type, etc. ISBN, date, location, title, component type, etc. Trial was Aug 2000 to Jan 2002 - not Trial was Aug 2000 to Jan 2002 - not
much evidence of usemuch evidence of use Many issues the same as for SICI, but Many issues the same as for SICI, but
also less business pushalso less business push
PIIPII
Publisher Item IdentifierPublisher Item Identifier Proposed in 1995 by ACS, AIP, APS, IEEE Proposed in 1995 by ACS, AIP, APS, IEEE
and Elsevier, but never became a and Elsevier, but never became a standardstandard
e.g. e.g. S0749-503X011234S0749-503X011234 Some publishers use as internal id since Some publishers use as internal id since
doesn’t suffer from any of the SICI doesn’t suffer from any of the SICI problemsproblems
But no registration/maintenance agencyBut no registration/maintenance agency
DOIDOI
Digital Object IdentifierDigital Object Identifier ANSI/NISO Z39.84-2000ANSI/NISO Z39.84-2000 e.g. issue = e.g. issue = 10.1002/yea.v18:110.1002/yea.v18:1
article = article = 10.1002/yea.123410.1002/yea.1234 Well established in academic journals Well established in academic journals
publishing - esp. ‘cos of CrossRefpublishing - esp. ‘cos of CrossRef 4.2 million DOIs deposited to date4.2 million DOIs deposited to date http://www.doi.orghttp://www.doi.org
Some publishing issues Some publishing issues regarding DOIsregarding DOIs
What are they assigned to? What are they assigned to? Need for Need for matching URLmatching URL, so can’t assign , so can’t assign
to anything you wouldn’t give a URL toto anything you wouldn’t give a URL to Individual publishers need to decide their Individual publishers need to decide their
DOI structureDOI structure Doesn’t have to be human-friendly but Doesn’t have to be human-friendly but
must be unique, easily generated, and must be unique, easily generated, and matched with URLmatched with URL
Application profiles for different genresApplication profiles for different genres
ProcessesProcesses
Apply to Registration Agency (Apply to Registration Agency (IDF, CDI, IDF, CDI, CrossRef, Enpia, LONCrossRef, Enpia, LON) for Registrant ) for Registrant PrefixPrefix
For individual DOIs, batch-process - For individual DOIs, batch-process - generate DOIs and URLs from electronic generate DOIs and URLs from electronic metadata and send to RA for depositmetadata and send to RA for deposit
DOIs never change (even if journal DOIs never change (even if journal changes ownership) but matched URLs changes ownership) but matched URLs (or other locators) can(or other locators) can
ISTCISTC
International Standard Textual Work CodeInternational Standard Textual Work Code ISO Committee Draft 21047 - circulated Oct ISO Committee Draft 21047 - circulated Oct
01, voting finished Jan 02: progressed to 01, voting finished Jan 02: progressed to Enquiry stageEnquiry stage
http://www.nlc-bnc.ca/iso/http://www.nlc-bnc.ca/iso/tc46sc9/21047.htmtc46sc9/21047.htm
E.g. E.g. 0A9-2002-1223F332-00A9-2002-1223F332-0(RA+year+WorkID+check)(RA+year+WorkID+check)
A A WorkWork (= abstract creation) id - replaces (= abstract creation) id - replaces the ISWC(L)the ISWC(L)
Creator-centric - authors may apply Creator-centric - authors may apply to ISTC Agency directly or via to ISTC Agency directly or via agents or via publisheragents or via publisher
Requires metadata deposit tooRequires metadata deposit too Publishers therefore need to Publishers therefore need to
capture these numbers if they’ve capture these numbers if they’ve been assigned to Worksbeen assigned to Works
Will authors really bother with this?Will authors really bother with this?
A couple of non-text, non-A couple of non-text, non-graphic Ids you might graphic Ids you might want to know aboutwant to know about
ISANISAN ISWCISWC
ISANISAN
International Standard Audiovisual NumberInternational Standard Audiovisual Number ISO Draft International Standard 15706 ISO Draft International Standard 15706 E.g. E.g. 153C-7365-B36F-844C-N153C-7365-B36F-844C-N Can be issued to movies, trailers, TV Can be issued to movies, trailers, TV
programmes, episodes or series, ads, programmes, episodes or series, ads, multimedia works if A/V component is multimedia works if A/V component is significantsignificant
http://www.nlc-bnc.ca/iso/tc46sc9/isan.htmhttp://www.nlc-bnc.ca/iso/tc46sc9/isan.htm Work has also started on a V-ISAN for VersionsWork has also started on a V-ISAN for Versions
ISWCISWC
International Standard Musical Work International Standard Musical Work Code (used to be ISWC(T))Code (used to be ISWC(T))
ISO 15707ISO 15707 e.g. e.g. T-034524680-1T-034524680-1 Identifies any musical work, including Identifies any musical work, including
arrangements, movements, medleys, arrangements, movements, medleys, samplessamples
http://www.iswc.org/iswc/iswc/en/html/http://www.iswc.org/iswc/iswc/en/html/home.htmlhome.html
MetadataMetadata
Resource discovery (Resource discovery (Dublin Core, Dublin Core, OAI-PMH), incl. Linking (CrossRefOAI-PMH), incl. Linking (CrossRef))
Product metadata (Product metadata (ONIX and ONIX ONIX and ONIX for Serialsfor Serials))
Preservation metadata (Preservation metadata (OAISOAIS)) I am not going to talk about library-specific I am not going to talk about library-specific
sets such as MARC, Z-3950, AACR2, etc.sets such as MARC, Z-3950, AACR2, etc.
Dublin Core Dublin Core
Defined Universal Bibliographic Language Defined Universal Bibliographic Language for Internet Navigation and Coherent for Internet Navigation and Coherent Online Resource Exploration Online Resource Exploration [not really!][not really!]
ANSI Z-3985ANSI Z-3985 DC 1.1 (simple, unqualified set of 15 DC 1.1 (simple, unqualified set of 15
elements) elements) Qualified set (DCQ? dcterms?) needed to Qualified set (DCQ? dcterms?) needed to
do anything more than basic - not do anything more than basic - not standard yetstandard yet
DC has been mandated by UK DC has been mandated by UK Government (“e-GMS”)Government (“e-GMS”)
Application Profiles will deal with Application Profiles will deal with defined local extensions via defined local extensions via namespace declarationsnamespace declarations
OAI-PMHOAI-PMH
Open Archives Initiative Protocol for Metadata Open Archives Initiative Protocol for Metadata HarvestingHarvesting
Not really an archive in the sense of repository, more of Not really an archive in the sense of repository, more of a political statement and a metadata harvesting a political statement and a metadata harvesting protocolprotocol
Came out of the E-print community, but they welcome Came out of the E-print community, but they welcome commercial publisherscommercial publishers
Supported by DLF and CNISupported by DLF and CNI Uses simple (unqualified) Dublin Core as its metadataUses simple (unqualified) Dublin Core as its metadata E.g. E.g. <creator><creator>Cliff MorganCliff Morgan</></> Version 2 of protocol due for release June 2002Version 2 of protocol due for release June 2002 http://www.openarchives.orghttp://www.openarchives.org
CrossRef metadata setCrossRef metadata set
CrossRef matches the metadata in a CrossRef matches the metadata in a citation with the metadata in its citation with the metadata in its Metadata Database (MDDB), which Metadata Database (MDDB), which includes the DOI for the resourceincludes the DOI for the resource
Participating publishers (91 of ‘em) Participating publishers (91 of ‘em) deposit the m/data with DOI into the deposit the m/data with DOI into the MDDBMDDB
To date, 3.7M DOIs, covering 5000+ jnlsTo date, 3.7M DOIs, covering 5000+ jnls http://www.crossref.orghttp://www.crossref.org
New versionNew version
Version 2 much more complicated Version 2 much more complicated - full schema is 113 pages long- full schema is 113 pages long
In addition to journals, covers In addition to journals, covers books and conference books and conference proceedings, at whole title and proceedings, at whole title and chapter levelchapter level
Some element names are different Some element names are different from CrossRef 1.0from CrossRef 1.0
ONIXONIX
OnLine Information eXchangeOnLine Information eXchange Latest release is 2.0Latest release is 2.0 Original focus was message format for Original focus was message format for
books through the trade, but is fast books through the trade, but is fast becoming a universal metadata set for becoming a universal metadata set for describing publicationsdescribing publications
http://www.editeur.orghttp://www.editeur.org
ONIX being championed by a ONIX being championed by a number of publishers and online number of publishers and online retailersretailers
Swedish Royal Library using ONIX Swedish Royal Library using ONIX as an input mediumas an input medium
ONIX for SerialsONIX for Serials
Provides rich cataloguing information Provides rich cataloguing information for agents, librarians, usersfor agents, librarians, users
Supports alerting, despatch and Supports alerting, despatch and library check-inlibrary check-in
Structured, multi-level bibliographic Structured, multi-level bibliographic descriptions, including ToCsdescriptions, including ToCs
Descriptions for library holdings Descriptions for library holdings (direct to OPACs)(direct to OPACs)
Draft 2 just released this monthDraft 2 just released this month Subscription Package Record provides Subscription Package Record provides
product catalogue info about product catalogue info about subscription packagessubscription packages
Serial Title Record provides catalogue Serial Title Record provides catalogue info about an individual serial info about an individual serial
Serial Item Record provides structured Serial Item Record provides structured multi-level bibliographic description of multi-level bibliographic description of serial partsserial parts
So is the CrossRef set like the So is the CrossRef set like the ONIX for Serials set?ONIX for Serials set?
NoNo They both include metadata that can They both include metadata that can
be used to describe journals, issues be used to describe journals, issues and articlesand articles
But they don’t use the same element But they don’t use the same element namesnames
CrossRef has mapped to ONIX but not CrossRef has mapped to ONIX but not to ONIX for Serials yet - but has said to ONIX for Serials yet - but has said will support when releasedwill support when released
OpenURLOpenURL
NISO Work Item NISO Work Item Separates metadata for resource from Separates metadata for resource from
metadata for locationmetadata for location Resolver services (such as SFX, CrossRef) Resolver services (such as SFX, CrossRef)
make the context-sensitive linkmake the context-sensitive link Solves the “appropriate copy” problem, Solves the “appropriate copy” problem,
where more than one legit copy of an article where more than one legit copy of an article may be available to a library, e.g. local may be available to a library, e.g. local holding, consortium, aggregator service, holding, consortium, aggregator service, mirror site, publisher mirror site, publisher
OpenURL metadataOpenURL metadata
OpenURL comprises BASEURL and QUERYOpenURL comprises BASEURL and QUERY BASEURL identifies the resolver; QUERY is BASEURL identifies the resolver; QUERY is
a resource descriptiona resource description e.g. (simplified): e.g. (simplified):
http://resolver.ukoln.ac.ukhttp://resolver.ukoln.ac.uk/genre=article/genre=article&atitle=Information%20gateways:…&atitle=Information%20gateways:…
&issn=14684527&volume=24&spage=4&issn=14684527&volume=24&spage=400 &aulast=Heery&aufirst=Rachel&aulast=Heery&aufirst=Rachel
Genres defined as “referent-Genres defined as “referent-types”, such as book, chapter, types”, such as book, chapter, journal, article, conf proc and journal, article, conf proc and paper, dissertation, patent, report - paper, dissertation, patent, report - each has its own metadata speceach has its own metadata spec
High-level concept is the Bison-High-level concept is the Bison-Futé model Futé model http://www.dlib.org/dlib/july01/vandesompel/07vandesohttp://www.dlib.org/dlib/july01/vandesompel/07vandesompel.htmlmpel.html
Preservation metadataPreservation metadata
OAIS (Open Archival Information System) OAIS (Open Archival Information System) underlies all digital preservation modelsunderlies all digital preservation models
Nothing to do with OAINothing to do with OAI Based on SIPs (Submission Info Based on SIPs (Submission Info
Packages), AIPs (Archival Info Packages) Packages), AIPs (Archival Info Packages) and DIPs (Dissemination Info Packages)and DIPs (Dissemination Info Packages)
The Producer wraps the stuff up in a SIP, The Producer wraps the stuff up in a SIP, it gets ingested into an AIP, and sent out it gets ingested into an AIP, and sent out as a DIPas a DIP
Some other metadata Some other metadata activitiesactivities
LOMLOM - Learning Object Model - Learning Object Model IMSIMS - Instructional Management Set (builds - Instructional Management Set (builds
on LOM)on LOM) PRISMPRISM - Publishing Requirements for Industry - Publishing Requirements for Industry
Standard MetadataStandard Metadata MEGMEG - cross-sectoral Metadata for Education - cross-sectoral Metadata for Education
GroupGroup SCORMSCORM - Shared Contents Objects Reference - Shared Contents Objects Reference
Model - US DoD project, also builds on Model - US DoD project, also builds on IMS/LOMIMS/LOM
How are we supposed to How are we supposed to cope with all these metadata cope with all these metadata sets?sets?
A publisher’s metadata becomes an important A publisher’s metadata becomes an important asset for describing product to the outside asset for describing product to the outside world, esp. for trading and linkingworld, esp. for trading and linking
If publishers have their publications in If publishers have their publications in electronic form, the metadata will be in there electronic form, the metadata will be in there in the file so it just needs extracting and in the file so it just needs extracting and mapping to whatever metadata set the mapping to whatever metadata set the publisher choosespublisher chooses
Production issue: who checks the metadata?Production issue: who checks the metadata?
E-booksE-books
OEBPS - Open E-Book Publication StructureOEBPS - Open E-Book Publication Structure Three components:Three components: a) a)
XML DTD for contentXML DTD for content b) b) DC-based metadata (but some non-DC-based metadata (but some non-compliant qualifier attributes)compliant qualifier attributes) c) c) description of package’s structure, description of package’s structure, reading order, navigationreading order, navigation
Many OEB files are just (a)Many OEB files are just (a) Version 2 being worked on, esp. M&I, and Version 2 being worked on, esp. M&I, and
RightsRights
FormatsFormats
Front runners are Adobe E-Book Front runners are Adobe E-Book Reader (PDF based) and Microsoft Reader (PDF based) and Microsoft Reader (.lit based)Reader (.lit based)
.lit limited to simple stuff, and not .lit limited to simple stuff, and not as robust as PDF, but can’t as robust as PDF, but can’t underestimate M/softunderestimate M/soft
New versions of Adobe will have New versions of Adobe will have built-in DOI capabilitybuilt-in DOI capability
Text reflow Text reflow
Acrobat 5 introduced sructured PDFAcrobat 5 introduced sructured PDF The Holy Grail synthesis of structure and The Holy Grail synthesis of structure and
presentationpresentation Writes a PDF file in XML(ish)Writes a PDF file in XML(ish) Asserts reading orderAsserts reading order Allows for reflow into different reader Allows for reflow into different reader
devicesdevices Works best for simple only, but good startWorks best for simple only, but good start
ConclusionsConclusions
There are lots of standards out thereThere are lots of standards out there Some of them compete with one anotherSome of them compete with one another Not all of them are formal Not all of them are formal They may change over timeThey may change over time Publishing industry standards are not Publishing industry standards are not
only developed by the publishing only developed by the publishing industryindustry
Not always easy to judge the winnersNot always easy to judge the winners