| Literature DB >> 35186448 |
Gaurav Vaidya1,2, Nico Cellinese2,3, Hilmar Lapp4.
Abstract
To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades. ©2022 Vaidya et al.Entities:
Keywords: Clade definitions; Computational semantics; Data curation; Data standard; JSON-LD; Phylogenetics; Semantic web
Year: 2022 PMID: 35186448 PMCID: PMC8855714 DOI: 10.7717/peerj.12618
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The three basic types of phylogenetic definitions.
(1) Minimum clade definitions, which designate the smallest clade that includes at least two internal specifiers (in this case, ‘B’ and ‘C’); (2) Maximum clade definitions, which designate the largest clade that includes one or more internal specifiers (‘B’) but excludes one or more external specifiers (‘A’), and (3) apomorphy-based definitions, which designate the clade that arises from the first appearance of a specified trait (‘D’) that is synapomorphic with an internal specifier (‘C’). Redrawn from De Queiroz & Gauthier (1990).
Fields in the Phyx document.
Fields indicated with * are for use by supporting software (such as phyx.js) but are not mapped to RDF properties, and will thus not be converted into RDF.
|
|
|
|
|
|---|---|---|---|
| @context | IRI | Depends on the version of the Phyx standard being used. Currently, this should be | |
| @type | Additional RDF types for the top level object. (“owl:Ontology” will be added automatically.) | Array of either IRIs or CURIEs | [ “ |
| owl:imports | A list of OWL ontologies to be imported during reasoning. (See | Array of IRIs | [ “ |
| doi | The Digital Object Identifier (doi) for this Phyx file. | DOI |
|
| source | A citation to this Phyx file. | Citation | See example of Citation below. |
| defaultNomenclaturalCodeIRI* | The default nomenclatural code to be used in this file, for both phylogenies and phyloreferences. This will only be used for nodes and taxon concept specifiers that don’t have a nomenclatural code set. | IRI |
|
| phylogenies | A list of phylogenies in this Phyx file. | Array of Phylogenies | See example of Phylogeny below. |
| phylorefs | A list of phyloreferences in this Phyx file. | Array of Phyloreferences | See example of Phyloreference below. |
Figure 2Relationships and references between different types in the Phyx format.
Note that not all citation objects are fully expanded.
RDF prefixes used in phyloreferencing.
|
|
|
|
|---|---|---|
| rdf | Resource Description Framework (RDF) |
|
| owl | Web Ontology Language (OWL) |
|
| CDAO | Comparative Data Analysis Ontology (CDAO) ( |
|
| dwc | Darwin Core ( |
|
| tc | TDWG Taxon Concept LSID Ontology |
|
| tn | TDWG Taxon Name LSID Ontology |
|
| phyloref | The Phyloreferencing Ontology |
|
| tcan | Ontology for Taxon Concepts And Names |
|
Fields in a phyloreference object.
Fields indicated with * are for use by supporting software (such as phyx.js) but are not mapped to RDF properties, and will thus not be converted into RDF (or OWL). The phyx.js software copies the value of the ‘phylorefType‘ field into the ‘rdf:type‘ property during transformation to RDF.
|
|
|
|
|
|---|---|---|---|
| @id | The identifier for this phyloreference. | IRI | #Alligatoroidea |
| label | A name for the clade defined by the phyloreference. For clade definitions digitized from the PhyloCode, the name will follow PhyloCode naming conventions ( | String | Alligatoroidea |
| phylorefType | The type of this phyloreference. | Enumeration | One of phyloref:PhyloreferenceUsing Maximum Clade, phyloref:PhyloreferenceUsing MinimumClade or phyloref:PhyloreferenceUsing Apomorphy |
| scientificNameAuthorship | The authors who created this clade definition. | Citation | See the Citations section above for an example citation. |
| namePublishedIn | If the label is a scientific name, then this field records the publication in which that name was first published. | Citation | See the Citations section above for an example citation. |
| definitionSource | The publication in which this clade definition was first published. | Citation | See example citation above. |
| definition | String | Alligator mississippiensis and all crocodylians closer to it than to Crocodylus niloticus or Gavialis gangeticus. | |
| internalSpecifiers* | A list of internal specifiers (defined as taxonomic units) that must be included in the clade. | Array of taxonomic units | See taxonomic unit examples above. |
| externalSpecifiers* | A list of external specifiers (defined as taxonomic units) that must be excluded from the clade. | Array of taxonomic units | See taxonomic unit examples above. |
| apomorphy | If used, indicates that this phyloreference designates the clade that arises from the first appearance of this trait that is synapomorphic with an internal specifier. In this case, exactly one internal specifier and no external specifiers must be provided. The trait is described with the following fields: | Object | |
| - @type | IRI | Must be | |
| - bearingEntity* | An IRI that identifies the entity bearing the phenotypic quality if the phenotype referenced by this apomorphy can be decomposed into a quality and the entity bearing the quality (EQ model). | IRI |
|
| - phenotypicQuality* | An opaque IRI that identifies the phenotypic quality if the phenotype referenced by this apomorphy can be decomposed into entity and quality (EQ model). See | IRI | Defaults to |
| - definition | String | A complete turtle shell as inherited by | |
| expectedResolution* | A dictionary of phylogeny identifiers to objects that record the nodeLabel (the node label on that phylogeny this phylogeny is expected to resolve to) as well as an optional description (describing why that node was chosen). | Dictionary | “expectedResolution”: { “#phylogeny0”: { “nodeLabel”: “Gavialis gangeticus”, “description”: “Only representative of Gavialoidea in this phylogeny.” }} |
Fields in taxonomic unit objects identified solely by IRI.
|
|
|
|
|
|---|---|---|---|
| @id | IRI |
|
Fields in a taxon or taxon concept object.
|
|
|
|
|
|---|---|---|---|
| @type | IRI | Must be | |
| hasName | Object | ||
| - @type | IRI | Must be | |
| - nomenclaturalCode | The nomenclatural code under which this taxon name was created. | IRI |
|
| - nameComplete | String | Alligator mississippiensis | |
| - genusPart | The genus portion of the taxon name. | String | Alligator |
| - specificEpithet | The specific epithet portion of the taxon name. | String | mississippiensis |
| - label | The full taxon name, including an authority or year components. | String | Alligator mississippiensis (Daudin, 1802) |
| nameAccordingTo | Publication or authors whose circumscription of the taxon is intended to be used. If omitted, the nominal taxon concept (in the sense of | String |
|
Fields in a phylogeny object.
Fields marked with * are for use by supporting software (such as phyx.js) but are not mapped to RDF properties, and will thus not be converted into RDF (or OWL).
|
|
|
|
|
|---|---|---|---|
| @id | The identifier for this phylogeny. | IRI | #phylogeny0 |
| label | A label describing this phylogeny. | String | Fig 1 from |
| newick | String | (Parasuchia,(rauisuchians,Aetosauria,(sphenosuchians, (protosuchians,(mesosuchians,(Hylaeochampsa, Aegyptosuchus,Stomatosuchus,(Allodaposuchus,(‘Gavialis gangeticus’,((‘Diplocynodon ratelii’,(‘Alligator mississippiensis’,‘Caiman crocodilus’)Alligatoridae)Alligatoroidea,(‘Tomistoma schlegelii’,(‘Osteolaemus tetraspis’,‘Crocodylus niloticus’)Crocodylinae)Crocodylidae)Brevirostres) Crocodylia))Eusuchia)Mesoeucrocodylia) Crocodyliformes)Crocodylomorpha)) | |
| source | The source of this phylogeny. | Citation | See above for an example Citation. |
| additionalNodeProperties* | A dictionary mapping node labels to properties that should be added to the node when converting this Phyx file into an OWL ontology. | Object | “additionalNodeProperties”: { “Exodictyon incrassatum”: { “representsTaxonomicUnits”: [{ “@type”: “http://rs.tdwg.org/dwc/terms/Occurrence”, “institutionCode”: “UC”, “catalogNumber”: “Wall 2527, Fiji” ]} }} |