TaxPub: An Extension of JATS for Taxonomic Descriptions Terry Catapano, Plazi Leiden, Netherlands 2013-02-14

TaxPub: An Extension of JATS for Taxonomic Descriptions

Terry Catapano, PlaziLeiden, Netherlands


• Journal Article Tag Suite• Formerly NLM/NCBI Journal Archiving

and Publishing Tag Suite• Version 1, 2002• PubMed Central• Widely adopted by STM publishers• Now NISO JATS

•ANSI/NISO Z39.96-2012

• DTD Spectrum• Legacy/Loose <----> Prospective/Strict• <---------- cost of application ---------->• Archiving (Green) DTD• Publishing (Blue) DTD• Now also Authoring and Book DTDs• Offers extensibility features

Taxonomic Descriptions

• “Treatment”• Discussion of the features/distribution of a

related group of organisms, “taxon”• Formal conventions

• ICZN, ICBN, etc...• Frequently parts of publications• Cited as discrete objects• 200+ year history

Linnaeus, Systema Naturae, 10th Edition, 1767-1770

Taekul, C., N. F. Johnson, L. Masner, A. Polaszek and Rajmohana K.. 2010. World species of the genus Platyscelio Kieffer (Hymenoptera, Platygastridae). ZooKeys 50: 97-126.

Treatment Components

• NomenclatureoNameoAuthorityoStatus, etc…

• Description• Materials Examined

oSpecimens Collection Deposit

• Diagnosis, Distribution, Etymology, Key, etc…

Background: TaxonX

• NSF/DFG Funded Project• Extraction of species data from taxonomic

literature of Ants• TaxonX schema for markup of corpus• c. 500 publications; c. 11,000 treatments• Development continued by Plazi

Legacy Literature: Challenges

• Text accuracy• Formal/Editorial Variety• Condensed Information• Loose schema, higher costs of application

New Literature: Rationale

Matt Yoder et al., Development of the Hymenoptera Anatomy Ontology: Implications for Systematics and Literature Mark-up

• Extension of Publishing (“Blue”) DTD• Parsimony: largely rely on base DTD• “tp:” namespace• Available throughout

o <tp:taxon-name>: scientific nameso <tp:descriptive-statement>: morphologyo <tp:materials-citation>: specimens; gene sequences

• Within <body>o <tp:treatment> + subelements

            <p>A further undescribed <tp:taxon-name rank="genus">Nixonia</tp:taxon-name> species related to <tp:taxon-name rank="species">N. lamorali</tp:taxon-name> emerged from processing of samples collected in Kogelberg Biosphere Reserve (50km east of Cape Town). This species may usurp <tp:taxon-name rank="species">N. gigas</tp:taxon-name>...</p>

<tp:taxon-name>, con't

• @reg: regularized form of name• object-id: identifier(s) for name

o semantics of xlink attrs?• @*-part-type: semantics for name components

o stringo use URI's: here terms from Darwin Core vocabulary


<tp:taxon-name rank="species" reg="Nixonia lamorali"><object-id object-id-type="LSID" xlink:href="urn:lsid:biosci.ohio-state.edu:osuc_concepts:184923"/><tp:taxon-name-part taxon-name-part-type="dwc:genus" reg="Nixonia">N.</tp:taxon-name-part><tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet">lamorali</tp:taxon-name-part></tp:taxon-name>

• Relatively undeveloped• Modeling of descriptions challenging

o complex, if formal, natural language• Segment text

o <tp:descriptive-statement>• Delineate components

o <tp:descriptive-statment-part> • Normalize/Annotate

o <tp:descriptive-statment-part>

... <tp:descriptive-statement>Length 7.0 mm</tp:descriptive-statement>; <tp:descriptive-statement>completely black</tp:descriptive-statement>, <tp:descriptive-statement>tarsi lighter</tp:descriptive-statement> (figs. 2A, B); <tp:descriptive-statement> wings infuscate throughout, brownish</tp:descriptive-statement>...

...<tp:descriptive-statement><tp:descriptive-statement-part descriptive-statement-part-type="character"><object-id xlink:href="HAO:0000992 "/>tarsi<tp:descriptive-statement-part><tp:descriptive-statement-part descriptive-statement-part-type="state">lighter<tp:descriptive-statement-part></tp:descriptive-statement>...

<p>Spreading shrub; stems erect,<Categorical uri="http://ontology.org/plant/stem-color"> <State uri="http://ontology.org/plant/greenish">greenish</State></Categorical>. Leaves deciduous early in summer (particularly when infected with Diseasomyces), oblong, apex obtuse, glabrous or weakly hirsute; stipules sharply pointed, <Quantitative uri="http://ontology.org/plant/stipule-width"><value value="3.2">3,2mm</value></Quantitative> wide, <Categorical uri="http://ontology.org/plant/stipule-color"><State uri="http://ontology.org/plant/black">black</State> or <State uri="http://ontology.org/plant/brown">darkish brown,</State></Categorical>extremely rarely yellow, often shallowly joined around the node; spines stout.</p>

• <tp:collecting-event>: how, when collectedo <tp:collecting-location>: where collected

• <object-id>: current location

<tp:materials-citation>, con't

<tp:material-citation><named-content content-type="dwc:individualCount"                            >1</named-content> <named-content content-type="dwc:sex">male</named-content>, <tp:collecting-event><tp:collecting-location><tp:location location-type="dwc:country">South Africa</tp:location> <tp:location location-type="dwc:stateProvince>Western Cape"</tp:location><tp:location location-type="dwc:locality">Langberg Farm, (3 km 270° W Langebaanweg)</tp:location><tp:location location-type="dwc:verbatimCoordinates">32°58.461&#8217;S 18°07.344&#8217;E</tp:location></tp:collecting-location><named-content content-type="dwc:verbatimDate">12&#8211;19 Mar 2003</named-content></tp:collecting-event>,<named-content content-type="dwc:recordedBy">S. van Noort</named-content>, <named-content content-type="dwc:samplingProtocol">Malaise trap, LW02-N2-M175</named-content>, <named-content content-type="dwc:locationRemarks">Sand Plain Fynbos</named-content>, <object-id content-type="dwc:collectionCode">SAM-HYM-P030184</object-id>, <object-id content-type="dwc:catalogNumber">OSUC 256954</object-id>), (<object-id content-type="dwc:institutionCode">SAMC</object-id>)</tp:material-citation>

• tp:location:o @location-type:

URI (Darwin Core) string

• named-content: all other components

tp:treatment and Sub-Elements

• <tp:treatment-meta>o bibliographic metadata for treatmentso standalone treatments

• <tp:nomenclature>: requiredo <tp:taxon-name>: requiredo other elements...

• <tp:treatment-sec> o @sec-type

<tp:taxon-treatment>            <tp:nomenclature>                <tp:taxon-name rank="dwc:species" auth-code="iczn">                    <tp:taxon-name-part taxon-name-part-type="dwc:genus"                        >Nixonia</tp:taxon-name-part>                    <tp:taxon-name-part taxon-name-part-type="dwc:specificEpithet"                        >masneri</tp:taxon-name-part>                    <object-id                        xlink:href="urn:lsid:zoobank.org:act:51495B19-AA60-4560-AAC6-2EED4110C0ED"/>                </tp:taxon-name>                <tp:taxon-authority>van Noort &amp; Johnson</tp:taxon-authority>                <tp:taxon-status>sp. n.</tp:taxon-status>                <xref ref-type="fig" rid="f1">Figures 1A&#8211;F</xref>            </tp:nomenclature>

                <tp:nomenclature-citation-list>                    <tp:nomenclature-citation>                        <tp:taxon-name>Nixonia</tp:taxon-name><xref rid="B7">Masner, 1958, 101</xref>                        <comment>Original description. Type: <tp:taxon-name>Nixonia pretiosa</tp:taxon-name> Masner, by monotypy and original designation. For subsequent taxonomic literature see <xref rid="B4">Johnson (1992)</xref> or The Genera of <tp:taxon-name>Platygastroidea</tp:taxon-name> of the World (<ext-link xlink:href="http://purl.oclc.org/NET/hymenoptera/platygastroidea">http://purl.oclc.org/NET/hymenoptera/platygastroidea</ext-link>).</comment>                    </tp:nomenclature-citation>                </tp:nomenclature-citation-list>            </tp:nomenclature>

<tp:treatment-sec sec-type="Materials Examined">

<title>Type material</title>


<tp:treatment-sec sec-type="Diagnosis">


<p> Most similar to ... </p>


<tp:treatment-sec sec-type="Etymology">


<p> Named in honour of Lubomír Masner, ...</p>


<tp:treatment-sec sec-type="Distribution">

<title>Distribution and habitat association</title>

<p> Currently only known from two widely spaced localities.... </p>


<tp:treatment-sec sec-type="Description">


<treatment-sec>, con't

• Indentify subordinate taxa within higher taxon (e.g., species in genus)

• No model in TaxPub

• Use existing JATS table model

• Use <ext-ref> or <related-object>

Keys, con't <tp:treatment-sec sec-type="Key">

<title>Key to species of Nixonia</title>

<p>Online interactive key...></p>




<tr content-type="lead"> <td><target id="key1">1</target></td>

<td>Third antennal segment shorter than, or subequal to, second antennal segment</td>

<td><xref>2</xref></td> </tr>

<tr content-type="graphic"> <td> <graphic xlink:href=”” />




Future Work

• Extension to “Green”/Archiving DTDo For legacy literature

• Descriptive data (i.e., keys, characters, states, etc...)• Tools

• XSLT stylesheets for rendition/proofing• XSLT stylesheets for conversion to external formats• Development of supporting vocabularies• Schematron for profiling• Stand-alone validator

• Implementations• EJT• Smithsonian• Zootaxa 

