Upload
richard-ashby
View
226
Download
3
Tags:
Embed Size (px)
Citation preview
2003 CrossRef Annual Member Meeting Technical Working Group 1
Technical MeetingSeptember 17, 2003
London
2003 CrossRef Annual Member Meeting Technical Working Group 2
Agenda
9:00 - 9:05 Introduction
9:05 - 9:30 Review of system operation, configuration & issues 9:30 -10:00 TOC Syndication Using RSS Tony Hammond, Elsevier
10:00 - 10:15 Coffee Break
10:15 - 11:15 New system features
11:15 - 11:45 Related technology topics
11:45 - ? Open discussion
2003 CrossRef Annual Member Meeting Technical Working Group 3
Operational Issues
Current configuration and planned changes
System utilization / query response times
Conflicts
Schema Issues
Procedures
Misc. (special characters, getting XML, reports …)
Where to get help
2003 CrossRef Annual Member Meeting Technical Working Group 4
Current Configuration & Planned Changes
Dell 2650
Dell 2650
Dell 2650
Java •Web / synch queries
•Batch processing - deposits - asynch queries
Sun E4504 480Mhz Sparc II4GB Mem
•Oracle 8.1.7
Database
100 Mb switch10 Mb full dup
Planned changes Replace the database machine with a faster Sun box and better disks
Add a load balancer (service) on the front end so the Web component can run on more than one machine (good reason to switch to http://doi.crossref.org)
Add a shadow ethernet drop from the co-location facility
2003 CrossRef Annual Member Meeting Technical Working Group 5
System Utilization
Sat Sun Mon Tue Wed Thu Fri
0
10
20
30
40
50
60
70
80
90
0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16
System utilization driven by database use
Operations Availability (1/03 – 8/03): 97.6%
Performance : 1 successful query = avg. response time is 0.656 seconds
1 failed query = avg. response time is 1.2 seconds
Deposit performance depends on number of jobs in the Q (4 simultaneously)
(50% < 5min) (12%< 1hr) (16% < 6hr) (7%< 12hr) (10% < 24hr) (5% > 24hr)
2003 CrossRef Annual Member Meeting Technical Working Group 6
System Utilization
Query response time
0
10
20
30
40
50
60
0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16
(1) (2 to 9) (10 to 24)
Sat Sun Mon Tue Wed Thu Fri
2003 CrossRef Annual Member Meeting Technical Working Group 7
System Utilization
Query response time
0
100
200
300
400
500
600
700
800
900
1000
0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16 0 8 16
(25 to 49) (50 to 99) (100+)
Sat Sun Mon Tue Wed Thu Fri
2003 CrossRef Annual Member Meeting Technical Working Group 8
Conflicts A conflict is created when 2 (or more) DOIs are deposited with the same metadata
Journal / Author / Volume / Issue / Page / Article Title / Sequence-Number
Happens when:
No page numbers available
Limited meta-data (no author or article title)
Same article title (‘Book Reviews’) on the same page (no author)
Two publishers host the same article
Publisher wants to assign a new DOI (naming convention change)
What can be done about them Re-deposit one of the DOIs and change the meta-data (auto-resolve) An administrator can erase the conflict (both DOIs remain as peers)
A meta-data search will yield no (ambiguous) results or multiple hits An administrator can make one the ‘prime’ and the other an ‘alias’
The ‘alias’ is essentially deleted, it can never be updated again
(unless we do an ‘un-do’)
Ok
Ok
Ok
Bad
Ok?
2003 CrossRef Annual Member Meeting Technical Working Group 9
<record_diagnostic status="Warning"> <doi>10.1046/j.1365-2141.2003.04548.x</doi> <msg>Added with conflict</msg> <conflict_id>44487</conflict_id> <dois_in_conflict> <doi>10.1046/j.1365-2141.2003.04528.x</doi> </dois_in_conflict> </record_diagnostic> <batch_data> <record_count>24</record_count> <success_count>23</success_count> <warning_count>1</warning_count> <failure_count>0</failure_count> </batch_data>
Conflicts
What you see in your log
2003 CrossRef Annual Member Meeting Technical Working Group 10
Conflicts
The conflict report (we email them now, soon to be available on www.crossref.org)
Conflict report for 10.1016 created on Jul 24,2003
===========================================
ConfID: 105
CauseID: 14902006
OtherID: 1711,
JT: Analytica Chimica Acta
MD: Louwerse, 346 ,3,285,1997,Monitoring time-varying concentrations in sample streams by multiple input chromatography
DOI: 10.1016/S0003-2670(97)00138-4 (105-null )
DOI: 10.1016/S0003-2670(97)90065-9 (105-null )
===========================================
…
======== Number Of Conflicts: 25673 ========
2003 CrossRef Annual Member Meeting Technical Working Group 11
Conflicts
Un-resolved conflicts (as of July 24)
10.1006 3010.1021 1310.1103 710.1191 810.1111 310.1046 13310.1079 210.1017 4910.1054 1510.1016 25673
10.1055 110.1109 1810.1023 8110.1097 292710.1067 2010.1038 22210.1177 30610.1007 110.1086 2310.1053 109
10.1002 30010.1034 910.5555 110.1057 6410.1113 110.1211 110.1263 6610.1076 2010.1345 110.1350 110.1354 20
Prefix Count Prefix Count Prefix Count
2003 CrossRef Annual Member Meeting Technical Working Group 12
Schema Issues
Versions
Currently at 2.0.5, several small changes have been made to bring the schema to 2.0.5.3 (do not include the minor rev # in the declarations)
When a major change occurs (like forward linking) the schema will be revised and you must change the declaration to pick up the changes
(the system will recognize multiple schemas)
<?xml version="1.0" encoding="UTF-8"?>
<doi_batch version="2.0.5" xmlns="http://www.crossref.org/schema/2.0.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/schema/2.0.5 http://www.crossref.org/schema/2.0.5/crossref.xsd">
<head>
2003 CrossRef Annual Member Meeting Technical Working Group 13
Schema Issues
Character entities
Schema does (generally) not do character entities
Xerces will understand certain basic Latin
& < > "
For others you must numerically encode them
Greek small letter alpha => α or α
http://www.w3.org/TR/REC-html40/sgml/entities.html#h-24.2.1
http://www.unicode.org/
2003 CrossRef Annual Member Meeting Technical Working Group 14
Schema Parser
http://www.crossref.org/06members/55InstructionsforNewSchema.html
2003 CrossRef Annual Member Meeting Technical Working Group 15
Schema Parser
2003 CrossRef Annual Member Meeting Technical Working Group 16
Procedures
DOI Ownership Transfer When a journal is transferred between two publishers
1) Notify CrossRef via email, supply journal title, old and new prefix2) Supply a list of DOIs (CrossRef can generate the list if needed)3) Supply new URLs (if needed)4) CrossRef processes transfer - changes ownership in CrossRef MDDB - changes administration in handle system
DOI Changes & Conflict Resolution
When a new DOI is to be assigned to replace another DOI (due to error or the need to restructure the suffix composition)
1) Construct an XML deposit file for the new DOIs (same metadata as the old DOIs)2) Deposit it, examine the log for conflict warnings3) Contact CrossRef so an administrator can ‘overwrite’ the old DOIs with
the new DOIs - deprecates old DOI in MDDB - aliases old DOI to new DOI in handle system
2003 CrossRef Annual Member Meeting Technical Working Group 17
aamramerindhygametamonhamsaphaarnoldaspbsbiomedcenbrillbrpscabicupdekkerediciones
elseviereurrespsocfdcommfuninaprgeosocietyhhshhs-mosbyhindawiilsiinschemengisasjmrykargerlawerllibrapharm
lwwmaneymaryannemcbmitnaturepengppharmapressppressroysocmedsageschweizsivbspringertaylorurfiwileywscientific
Schema Issues
Publishers Still Using the DTD
2003 CrossRef Annual Member Meeting Technical Working Group 18
Special Characters
13616455,09534075| Journal of Physics B Atomic Molecular and Optical Physics|Trábert|36|6|1129|2003| full_text||10.108/53-4075/36/6/305
Trábert: [0]=0x54,[1]=0x72,[2]=0xa0,[3]=0x62,[4]=0x65,[5]=0x72,[6]=0x7
<surname>Tr bert</surname> #160 = #xA0
13616455,09534075|Journal of Physics B Atomic Molecular and Optical Physics|Tr bert|36|6|1129|2003|full_text||10.1088/0953-4075/36/6/305
In a Browser (IE 6.0)
In a DOS command window
The query: |j phys B at mol opt phys||36||1129||||
Byte string of the author’s name
What was deposited
2003 CrossRef Annual Member Meeting Technical Working Group 19
10991395,08943230|Journal of Physical Organic Chemistry|Ch↓ciDska|16|4|213|2003|full_text||10.1002/poc.596
Ch↓ciDska: [0]=0x43,[1]=0x68,[2]=0x19,[3]=0x63,[4]=0x69,[5]=0x44, [6]=0x73,[7]=0x6b,[8]=0x61
<surname>Chęcińska</surname>
In a Browser (IE 6.0)
In a DOS command window
The query: |j phys org chem||16||213||||
Byte string of the author’s name
What was deposited
10991395,08943230|Journal of Physical Organic Chemistry|ChciDska|16|4|213|2003|full_text||10.1002/poc.596
0x19 is unprintable, 0x44 is a ‘D’
Special Characters
2003 CrossRef Annual Member Meeting Technical Working Group 20
Special Characters
2003 CrossRef Annual Member Meeting Technical Working Group 21
<?xml version = "1.0" encoding = "UTF-8"?><doi_batch version = "0.3">…<body><doi_record type = "full_text"><doi_data>
<doi>10.1002/poc.596</doi><url/>
</doi_data><journal_article_metadata>
<article><author sequence = "first"> <given_name/> <surname>Chęcińska</surname></author><date type = "print"><year>2003</year>
Special Characters
2003 CrossRef Annual Member Meeting Technical Working Group 22
Getting the XML for DOIs
In order to update meta-data you need to have the XML
An update is a complete re-write, omitted fields in the update are removed from the database
The update can be in Schema even if the original deposit was in DTD
CrossRef can now retrieve the XML for a list of DOIs.
DOIs can be from different submissions
Extracted XML will be in the same format as the deposit/update
Email the list of DOIs to [email protected]
2003 CrossRef Annual Member Meeting Technical Working Group 23
Depositor Report
2003 CrossRef Annual Member Meeting Technical Working Group 24
Query Formulation
Journal queries differ from Conference Proceedings and Book queries
For Conference Proceedings:SER_TITLE => <conference><proceedings_metadata><series_metadata><titles><title> (one, optional)VOL_TITLE => <conference><proceedings_metadata><proceedings_title> (must have one)Note: < conference ><conference_paper><titles><title> is not searchable
For Books:SER_TITLE => <book>><book_metadata><series_metadata><titles><title>VOL_TITLE => <book><book_metadata><titles><title> (1..6 of either of these)Note: <book><content_item><titles><title> is not searchable
Journals:ISSN |TITLE/ABBREV | FIRST AUTHOR| VOLUME | ISSUE | START PAGE | YEAR | RESOURCE TYPE | KEY | DOI
Books and conference proceedings:
ISBN/ISSN | SER_TITLE | VOL_TITLE | AUTHOR/EDITOR | VOLUME | EDITION_NUMBER | PAGE | YEAR | COMPONENT_NUMBER | RESOURCE_TYPE | KEY | DOI
Note: For series titles to matter you must assign them a DOI when depositing
2003 CrossRef Annual Member Meeting Technical Working Group 25
Where to get help
http://www.crossref.org/02publishers/23how_start.html
http://www.crossref.org/02publishers/24upload_spec.html
http://www.crossref.org/02publishers/25query_spec.html
http://doi.crossref.org/doc/userdoc.html
How to get started
How to deposit
How to query
System help pages (more technical)
2003 CrossRef Annual Member Meeting Technical Working Group 26
RSS And TOCs
2003 CrossRef Annual Member Meeting Technical Working Group 27
New System Features
Tracking IDs
XML Query Format
Enhanced XML Results
Forward Matching
Forward Linking
Real Time Queries
2003 CrossRef Annual Member Meeting Technical Working Group 28
Tracking IDs
http://doi.crossref.org/servlet/submissionDownload?usr=<USR>&pwd=<PWD>&doi_batch_id=2003-08-11-1016008&type=result
<doi_batch xmlns="http://www.crossref.org/schema/2.0.5" ... <head> <doi_batch_id>2003-08-11-1006001</doi_batch_id> <timestamp>20030811104844000</timestamp> <depositor> <name>Dale Langley</name>
Problem: Deposit logs received via email is error prone
Solution: Allow depositors to interrogate the system
Using the batch ID you specify in the XML file
Perform an HTTP Get
2003 CrossRef Annual Member Meeting Technical Working Group 29
Tracking IDs
2003 CrossRef Annual Member Meeting Technical Working Group 30
XML Query Format
Problem: Piped queries can be awkward to deal with and not extensible
Solution: Create an XML schema for queries
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.crossref.org/qschema/1.0" xmlns="http://www.crossref.org/qschema/1.0">
To use some of the new CrossRef features you must query using this format Request multiple hits Request forward matching Turn off fuzzy matching (on a field by field basis) Use tracking Ids on query jobs Match on NULL fields
http://doi.crossref.org/doc/tech/crossref_query_input.xsd
2003 CrossRef Annual Member Meeting Technical Working Group 31
Enhanced XML Query Results
Problem: We wanted to supply more information about the query results
Solution: Create an extended XML schema for query results
The existing XML query result format is still available (no schema or DTD)
http://doi.crossref.org/doc/tech/crossref_query_output.xsd
Results in this format can show Multiple hits – when the metadata search resolved to more than one DOI (ambiguous results produce NO match in a normal query)
Fuzzy matching indicators
2003 CrossRef Annual Member Meeting Technical Working Group 32
Input
<?xml version = "1.0" encoding="UTF-8"?><query_batch version="1.0" xmlns = "http://www.crossref.org/qschema/1.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <head> <email_address>[email protected]</email_address> <doi_batch_id>SomeTrackingID1</doi_batch_id> </head> <body> <query key="MyKey1" enable-multiple-hits="false"> <journal_title>Develop. Dynamics</journal_title> <author match="null"></author> <volume>223</volume> <first_page>426</first_page> <year>2002</year> </query> </body></query_batch>
Output<?xml version="1.0" encoding="UTF-8" ?> <crossref_result version="1.0" xmlns="http://www.crossref.org/qschema/1.0" <query_result> <head> <email_address>[email protected]</email_address> <doi_batch_id>SomeTrackingID1</doi_batch_id> </head> <body> <query key="MyKey1" status="resolved"> <doi>10.1002/dvdy.10084</doi> <issn>10970177</issn> <journal_title match="fuzzy">Developmental Dynamics</journal_title> <volume match="exact">223</volume> <issue>3</issue> <first_page match="exact">426</first_page> <year match="exact">2002</year> <publication_type>full_text</publication_type> </query>
Enhanced XML Query Results
2003 CrossRef Annual Member Meeting Technical Working Group 33
Forward Matching
Problem: Members must re-query CrossRef to find DOIs from recent deposits
Solution: Create a stored query mechanism where CrossRef ‘remembers’ failed queries, who sent them and notifies them when a deposit resolves them
Request forward matching in the query
<query key="MyKey1" enable-multiple-hits="false“ forward-match=“true” key=“some-unique-key”> <journal_title>Develop. Dynamics</journal_title> <author match="null"></author> <volume>223</volume> <first_page>426</first_page> <year>2002</year> </query>
2003 CrossRef Annual Member Meeting Technical Working Group 34
Forward Matching
Each stored query is identified by user assigned attributes The query must be given a unique ‘key’ The request must have a unique ‘batch-id’
Identical queries are stored once, multiple users may be associated with one query
When a stored query matches an email is sent with the XML results
(one email per query)
Users may ‘poll’ for stored queries that have matched
http://doi.crossref.org/servlet/downloadStoredQueries?usr=<username>&pwd=<password>&startDate=<startDate>&endDate=<endDate>
Start and end dates are inclusive. Must be formatted a yyyy-mm-dd
2003 CrossRef Annual Member Meeting Technical Working Group 35
Forward Matching
Processing Forward Matches
1. Upon receipt the query (if it initially fails) is sorted based on title
2. When deposits are made, titles trigger corresponding stored queries
3. Batch job will be run nightly to process all triggered stored queries
4. If the batch does not complete in the allowed window it will pick up from where it left off in the next interval
5. Weekend intervals will be longer than weekday intervals
2003 CrossRef Annual Member Meeting Technical Working Group 36
Forward Linking
A new service to be deployed by year end (pending board final approval)
Allow the retrieval of DOIs for articles that cite another article
Builds on existing CrossRef transactions
Publishers deposit metadata for their articles
Publishers query for the references in those articles to get DOIs
If CrossRef knows which references go with which articles we can tell owners of the cited articles who ‘cites’ them
2003 CrossRef Annual Member Meeting Technical Working Group 37
Forward Linking
2003 CrossRef Annual Member Meeting Technical Working Group 38
Forward Linking
Include the list of references with the article when registering the DOI Combine deposit and query into one step
Or, upload a list of references for an article
Example:
[source document DOI] tab [||||||||reference DOI]
[source document DOI] tab [issn|journal|author|volume|issue|page|year]
[source document DOI] tab [reference DOI]
…
Constraints:
• The source document DOI must exist in CrossRef prior to using this method.
• If the reference is supplied as a DOI, that DOI must exist in CrossRef
• For bulk loads no metadata query results will be returned.
• One cited reference per line
2003 CrossRef Annual Member Meeting Technical Working Group 39
Forward Linking Forward links (aka. Cited-By lists) will be retrieved by a simple query
Send in the DOI of the target article (a new HTTP request)
Receive an extended set of meta-data (in an XML response) for each of the citing articles
Name Comment
Full Journal Title
Abbreviated Journal Title
ISSNs More than 1
Journal DOI
Article Title Required element in schema deposits
ContributorsAll authors, given and surname up to a max of 10 (when >10 list as et al)
Volume
Issue
Page Page range
Article DOI
Article Identifier <identifier> or <item_number>
2003 CrossRef Annual Member Meeting Technical Working Group 40
Real Time Queries
Real time queries are done on-the fly
When a user clicks on a link
When the page is constructed for display
The current HTTP GET interface is being used by some for real time queries
No guaranteed level of service BUT for single queries 1 second response
Does place an un-intended load on the system
A ‘premium’ query service will be available that operates on a different interface
Better, more deterministic access to the CrossRef system
Less overhead than the synchronous HTTP GET interface
In testing now!
2003 CrossRef Annual Member Meeting Technical Working Group 41
The Handle System
Parameter Passing
Open URL
Local Link Servers
Multiple Resolution
Related Technologies
2003 CrossRef Annual Member Meeting Technical Working Group 42
The Handle System
What is it ?
A network accessible database for retrieving name-value pairs
A “hash” where the DOI is the key
A system built by CNRI and licensed for use by the IDF (DOI)
How does DOI / CrossRef use it ?
CrossRef is a service to input and update the name-value pairs
http://dx.doi.org is a resolver that knows what we want to do with DOIs
DOI could (and may someday) operate on any look-up mechanism
For key=10.xxxx/yyyy retrieve the name-value pair called ‘URL’ and redirect to the value string (which should be a URL)
Important: DOI and Handle are not the same thing! DOIs are “handles” which are intended to behave a certain way
2003 CrossRef Annual Member Meeting Technical Working Group 43
The Handle System
2003 CrossRef Annual Member Meeting Technical Working Group 44
<timestamp>0004</timestamp>
The Handle System
2003 CrossRef Annual Member Meeting Technical Working Group 45
OpenURL and DOIs/CrossRef
CrossRef helps solve the appropriate copy problem by providing a ‘reverse’ DOI lookup (DOI in / meta-data out)
http://doi.crossref.org/servlet/query?id=10.1006/jmbi.2000.4282&pid=<USR>:<PWD>
CrossRef offers an OpenURL 1.0 compliant resolver
http://doi.crossref.org/resolve?pid=<USR>:<PWD>&aulast=Maas LRM&title= JOURNAL OF PHYSICAL OCEANOGRAPHY&volume=32&issue=3 &spage=870&date=2002
(This resolver will redirect you to the target document)
OpenURL and DOI are complementary technologies
2003 CrossRef Annual Member Meeting Technical Working Group 46
http://resolver.example.org?• Start with the
BaseURLurl_ver=z39.88-2003
&url_ctx_fmt=ori:fmt:kev:mtx:ctx
&rft_id=ori:doi:10.1126/science.275.5304.1320
&rft_id=ori:pmid:9036860 &rft_val_fmt=ori:fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Isolation of a common receptor
for …&rft.jtitle=Science&rft.aulast=Bergelson&rft.auinit=J…
• Add the fixed elements
• Add the referrer• Add the referent
&rfr_id=ori:rfr:publisher.comIdentifier for referrer
OpenURL “Referrer” Domain of referrer
Version
Declare ContextObject
format
Include identifiersDeclare the
metadata format
Shows referent (rft) format (fmt) is by value (val) Indicates we are using key-
encoded-values (kev) as defined in the “journal” matrix
(mtx) in the registry.
Add the metadata elements
Substitute actual values for the item being referenced
What does an OpenURL look like
2003 CrossRef Annual Member Meeting Technical Working Group 47
A&I (Ovid) as link source
Link Source Link Menu
Link Target
2003 CrossRef Annual Member Meeting Technical Working Group 48
OpenURL Aware
OpenURL Linking: SFX & CrossRef
References
DOI Server
ServerDOI
OpenURL
Metadata
DOI link
http://www.sfx.edu/? doi=10.1034/j.1399-0039.2000.560502.x
http://dx.doi.org/ doi=10.1034/j.1399-0039.2000.560502.x
2003 CrossRef Annual Member Meeting Technical Working Group 49
Multiple Resolution
2003 CrossRef Annual Member Meeting Technical Working Group 50
Multiple Resolution
Re Hosted journal (the appropriate copy problem)
1. Publisher A produces and hosts a journal from 1999 through 2002
2. Publisher B acquires the journal in 2003 and hosts all back issues
Pre 2003 DOIs are transferred to publisher B and the URLs are reset to publishers B’s
3. A customer subscribes to publisher A and wants to go there for pre-2003 issues.
Journal available from more than one site (not a mirror)
Availability of alternative services (print, rights clearance …)
Supplemental material
Manifestations & Relations
2003 CrossRef Annual Member Meeting Technical Working Group 51
Multiple Resolution – things to consider
Multiple resolution is not always desirable (probably most of the time) In some contexts the link should behave as a single resolution
Implementation concerns Impact on the publisher’s Web site must be small (display and behavior) Resolution process first step (when the user clicks) must be fast Large data transfers may not be practical (link updates) Do not place too high a load on any single point (i.e. Handle)
Policy concerns Access control (who can get on the MR lists for a item) Action control (who decides when MR will be available to the user) Quality control (who monitors ‘landing page’ behavior)
2003 CrossRef Annual Member Meeting Technical Working Group 52
Multiple Resolution – what's next
CrossRef and Copyright Clearance Center have started a prototype
A white paper to be released this fall Define the operational issues Define the governance issues Present possible technical options
Construct a demonstration prototype
Not a fully functional solution
2003 CrossRef Annual Member Meeting Technical Working Group 53
Multiple Resolution
The technical issues are surmountable
The political and business case issues will be much more difficult
2003 CrossRef Annual Member Meeting Technical Working Group 54
CrossRef Technical Working Group
Join the TWG !!!
Monthly teleconferences
Mailing list [email protected]
Mail list archives available on www.crossref.org
Planning an ‘in-person’ TWG for later this year