| Literature DB >> 34690512 |
Mariya Dimitrova1,2, Viktor E Senderov3, Teodor Georgiev2, Georgi Zhelezov4, Lyubomir Penev4,5.
Abstract
BACKGROUND: OpenBiodiv is a biodiversity knowledge graph containing a synthetic linked open dataset, OpenBiodiv-LOD, which combines knowledge extracted from academic literature with the taxonomic backbone used by the Global Biodiversity Information Facility. The linked open data is modelled according to the OpenBiodiv-O ontology integrating semantic resource types from recognised biodiversity and publishing ontologies with OpenBiodiv-O resource types, introduced to capture the semantics of resources not modelled before. NEW INFORMATION: We introduce the new release of the OpenBiodiv-LOD attained through information extraction and modelling of additional biodiversity entities. It was achieved by further developments to OpenBiodiv-O, the data storage infrastructure and the workflow and accompanying R software packages used for transformation of academic literature into Resource Description Framework (RDF). We discuss how to utilise the LOD in biodiversity informatics and give examples by providing solutions to several competency questions. We investigate performance issues that arise due to the large amount of inferred statements in the graph and conclude that OWL-full inference is impractical for the project and that unnecessary inference should be avoided. Mariya Dimitrova, Viktor E Senderov, Teodor Georgiev, Georgi Zhelezov, Lyubomir Penev.Entities:
Year: 2021 PMID: 34690512 PMCID: PMC8486731 DOI: 10.3897/BDJ.9.e67671
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.Information flows in the OpenBiodiv system. Red arrows show the workflows outlined in this paper. Two projects associated with the OpenBiodiv system are also shown: the Pensoft Annotator (Dimitrova et al. 2020) and a prototype workflow for generation of biodiversity nanopublications.
Figure 2b.Illustration of the representation of hierarchical information imported from the GBIF Backbone Taxonomy for two taxonomic concepts, halii sec. [8] and sec. [8]. Each concept has an associated scientific name, denoted via the openbiodiv:hasScientificName property; however, the hierarchical information is not encoded in the names. The hierarchical relationship between halii sec. [8] and sec. [8] is encoded both via a skos:broader property and reified via the RCC-5 relationship encoded in .
RDF-ised biodiversity journals published by Pensoft as of 2 March 2021.
|
|
|
|
| ZooKeys | 4715 | 31966 |
| PhytoKeys | 968 | 4956 |
| Biodiversity Data Journal | 695 | 1360 |
| Journal of | 419 | 1235 |
| Comparative Cytogenetics | 338 | 41 |
| MycoKeys | 365 | 1482 |
| Zoosystematics and Evolution | 158 | 926 |
| Subterranean Biology | 152 | 187 |
| Zoologia | 149 | 78 |
| Nota Lepidopterologica | 124 | 135 |
| Neotropical Biology and Conservation | 100 | 42 |
| Italian Botanist | 81 | 15 |
| Deutsche Entomologische Zeitschrift | 80 | 609 |
| Journal of | 78 | 272 |
| Herpetozoa | 72 | 22 |
| African Invertebrates | 55 | 189 |
| Alpine Entomology | 54 | 173 |
| Arctic Environmental Research | 50 | 0 |
| Evolutionary Systematics | 41 | 171 |
| International Journal of Myriapodology | 18 | 97 |
Snippet of XML markup of a taxonomic name according to the TaxPub schema and the corresponding RDF triples.
|
| <tp:taxon-name> |
|
|
Data types marked up in articles following TaxPub and TaxonX schemas and the corresponding RDF types of the generated RDF resources. The TaxPub and TaxonX columns contain boolean values (True or False) indicating whether the information about the data type is retrieved from XML files encoded in the corresponding schema or not. For example, Plazi's XMLs, which follow the TaxonX schema, do not contain an Introduction section, hence no resource of type deo:Introduction is created from them.
|
|
|
|
|
| Article metadata | True | True | fabio:JournalArticle and related |
| Keyword group | True | False | openbiodiv:KeywordGroup |
| Abstract | True | True | sro:Abstract |
| Title | True | True | doco:Title |
| Author | True | True | foaf:Person |
| Introduction section | True | False | deo:Introduction |
| Discussion section | True | True | orb:Discussion |
| Treatment section | True | True | openbiodiv:Treatment |
| Nomenclature section | True | True | openbiodiv:NomenclatureSection |
| Materials examined | True | True | openbiodiv:MaterialsExamined |
| Diagnosis section | True | True | openbiodiv:DiagnosisSection |
| Distribution section | True | True | openbiodiv:DistributionSection |
| Taxonomic key | True | True | openbiodiv:TaxonomicKey |
| Figure | True | True | doco:Figure |
| Taxonomic name usage | True | True | openbiodiv:TaxonomicNameUsage |
| Bibliographic reference list | True | False | doco:BibliographicReferenceList |
| Bibliographic reference | True | True | deo:BibliographicReference |
| Institution | True | True | openbiodiv:Institution, openbiodiv:GRSciCollInstitution |
| Identification | True | True | dwc:Identification |
| Occurrence | True | True | dwc:Occurrence |
| Event | True | True | dwc:Event |
| Location | True | True | dwc:Location |
Figure 2a.A taxonomic name usage (http://openbiodiv.net/22ABFAA4-CFBD-4F17-9669-3FBDF5897892) is linked to the scientific name it mentions, and to the part of the article (abstract) it is contained in.
Figure 3.The Extractor procedure
XML snippet of an author with corresponding RDF
|
| <contrib contrib-type ="author" corresp ="no"> |
|
|
Parent node.
| openbiodiv:570F0E79-5632-FF88-A155-73625E50C567 rdf:type fabio:JournalArticle ; |
Update rule for replacement name.
| PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> |
Update rule for related name.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
A SPARQL query to retrieve 100 random related taxonomic names
| PREFIX openbiodiv: <http://openbiodiv.net/> |
Most profilic author SPARQL query.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
Most-mentioned scientific name.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
Most-mentioned species name.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
Most-mentioned species name by number of articles that mention it.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> |
Most-mentioned scientific names in figure captions.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
Figures from a given article.
| PREFIX fabio: <http://purl.org/spar/fabio/> |
Taxonomic discoveries in weevils (, ).
| PREFIX openbiodiv: <http://openbiodiv.net/> |
Sample Lucene query via SPARQL. We have intentionally misspelled the person’s name.
| PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#> |
Asks if the name given by the label has been replaced.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
Asks if the name given by the label is considered unavailable.
| PREFIX pkm: <http://proton.semanticweb.org/protonkm#>PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> |
Impact of the fire in Museu Nacional de Rio de Janeiro (MNRJ) on biodiversity knowledge.
| PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> |
People who have collected specimens belonging to the insect genus .
| PREFIX : <http://openbiodiv.net/> |
Institutional impact per family.
| PREFIX po: <http://www.essepuntato.it/2008/12/pattern#> |
Linking holotype descriptions, taxonomy, genomics and institutions.
| PREFIX datacite: <http://purl.org/spar/datacite/> |
Figure 5.OpenBiodiv-O is an ontology that links the publishing domain with the biodiversity domain. Major resource types covered by each of the ontology families are given in the box below the Venn diagram. Each of them is present in the OpenBiodiv-O ontology as a class. Important resources from the publishing domain are listed in the left-most column and from biodiversity informatics in the right-most column. The middle one covers important OpenBiodiv-O resources.