Literature DB >> 17992751

Semi-automated XML markup of biosystematic legacy literature with the GoldenGATE editor.

Guido Sautter1, Klemens Böhm, Donat Agosti.   

Abstract

Today, digitization of legacy literature is a big issue. This also applies to the domain of biosystematics, where this process has just started. Digitized biosystematics literature requires a very precise and fine grained markup in order to be useful for detailed search, data linkage and mining. However, manual markup on sentence level and below is cumbersome and time consuming. In this paper, we present and evaluate the GoldenGATE editor, which is designed for the special needs of marking up OCR output with XML. It is built in order to support the user in this process as far as possible: Its functionality ranges from easy, intuitive tagging through markup conversion to dynamic binding of configurable plug-ins provided by third parties. Our evaluation shows that marking up an OCR document using GoldenGATE is three to four times faster than with an off-the-shelf XML editor like XML-Spy. Using domain-specific NLP-based plug-ins, these numbers are even higher.

Mesh:

Year:  2007        PMID: 17992751

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  9 in total

1.  LINNAEUS: a species name identification system for biomedical literature.

Authors:  Martin Gerner; Goran Nenadic; Casey M Bergman
Journal:  BMC Bioinformatics       Date:  2010-02-11       Impact factor: 3.169

2.  XML schemas and mark-up practices of taxonomic literature.

Authors:  Lyubomir Penev; Christopher Hc Lyal; Anna Weitzman; David R Morse; David King; Guido Sautter; Teodor Georgiev; Robert A Morris; Terry Catapano; Donat Agosti
Journal:  Zookeys       Date:  2011-11-28       Impact factor: 1.546

3.  Towards the bibliography of life.

Authors:  David King; David R Morse; Alistair Willis; Anton Dil
Journal:  Zookeys       Date:  2011-11-28       Impact factor: 1.546

4.  Applications of natural language processing in biodiversity science.

Authors:  Anne E Thessen; Hong Cui; Dmitry Mozzherin
Journal:  Adv Bioinformatics       Date:  2012-05-22

5.  Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.

Authors:  Xiao Fu; Riza Batista-Navarro; Rafal Rak; Sophia Ananiadou
Journal:  J Biomed Semantics       Date:  2015-03-14

6.  Piecing together the biogeographic history of Chenopodium vulvaria L. using botanical literature and collections.

Authors:  Quentin J Groom
Journal:  PeerJ       Date:  2015-01-08       Impact factor: 2.984

7.  Digitising legacy zoological taxonomic literature: Processes, products and using the output.

Authors:  Christopher H C Lyal
Journal:  Zookeys       Date:  2016-01-07       Impact factor: 1.546

8.  Utilizing descriptive statements from the biodiversity heritage library to expand the Hymenoptera Anatomy Ontology.

Authors:  Katja C Seltmann; Zsolt Pénzes; Matthew J Yoder; Matthew A Bertone; Andrew R Deans
Journal:  PLoS One       Date:  2013-02-18       Impact factor: 3.240

9.  Eupolybothrus cavernicolus Komerički & Stoev sp. n. (Chilopoda: Lithobiomorpha: Lithobiidae): the first eukaryotic species description combining transcriptomic, DNA barcoding and micro-CT imaging data.

Authors:  Pavel Stoev; Ana Komerički; Nesrine Akkari; Shanlin Liu; Xin Zhou; Alexander M Weigand; Jeroen Hostens; Christopher I Hunter; Scott C Edmunds; David Porco; Marzio Zapparoli; Teodor Georgiev; Daniel Mietchen; David Roberts; Sarah Faulwetter; Vincent Smith; Lyubomir Penev
Journal:  Biodivers Data J       Date:  2013-10-28
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.