Literature DB >> 30393454

Resolving "orphaned" non-specific structures using machine learning and natural language processing methods.

Dongfang Xu1, Steven S Chong1,2, Thomas Rodenhausen1, Hong Cui1.   

Abstract

Scholarly publications of biodiversity literature contain a vast amount of information in human readable format. The detailed morphological descriptions in these publications contain rich information that can be extracted to facilitate analysis and computational biology research. However, the idiosyncrasies of morphological descriptions still pose a number of challenges to machines. In this work, we demonstrate the use of two different approaches to resolve meronym (i.e. part-of) relations between anatomical parts and their anchor organs, including a syntactic rule-based approach and a SVM-based (support vector machine) method. Both methods made use of domain ontologies. We compared the two approaches with two other baseline methods and the evaluation results show the syntactic methods (92.1% F1 score) outperformed the SVM methods (80.7% F1 score) and the part-of ontologies were valuable knowledge sources for the task. It is notable that the mistakes made by the two approaches rarely overlapped. Additional tests will be conducted on the development version of the Explorer of Taxon Concepts toolkit before we make the functionality publicly available. Meanwhile, we will further investigate and leverage the complementary nature of the two approaches to further drive down the error rate, as in practical application, even a 1% error rate could lead to hundreds of errors.

Entities:  

Keywords:  Anaphora Resolution; Biodiversity Literature; Information Extraction; Machine Learning; Morphological Descriptions; Ontology Application; Performance Evaluation

Year:  2018        PMID: 30393454      PMCID: PMC6207837          DOI: 10.3897/BDJ.6.e26659

Source DB:  PubMed          Journal:  Biodivers Data J        ISSN: 1314-2828


  9 in total

1.  Gene name ambiguity of eukaryotic nomenclatures.

Authors:  Lifeng Chen; Hongfang Liu; Carol Friedman
Journal:  Bioinformatics       Date:  2004-08-27       Impact factor: 6.937

Review 2.  Semantic annotation of morphological descriptions: an overall strategy.

Authors:  Hong Cui
Journal:  BMC Bioinformatics       Date:  2010-05-25       Impact factor: 3.169

3.  Applications of natural language processing in biodiversity science.

Authors:  Anne E Thessen; Hong Cui; Dmitry Mozzherin
Journal:  Adv Bioinformatics       Date:  2012-05-22

4.  OTO: Ontology Term Organizer.

Authors:  Fengqiong Huang; James A Macklin; Hong Cui; Heather A Cole; Lorena Endara
Journal:  BMC Bioinformatics       Date:  2015-02-15       Impact factor: 3.169

5.  Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Authors:  T Alexander Dececchi; James P Balhoff; Hilmar Lapp; Paula M Mabee
Journal:  Syst Biol       Date:  2015-05-26       Impact factor: 15.683

6.  The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants.

Authors:  Robert Hoehndorf; Mona Alshahrani; Georgios V Gkoutos; George Gosline; Quentin Groom; Thomas Hamann; Jens Kattge; Sylvia Mota de Oliveira; Marco Schmidt; Soraya Sierra; Erik Smets; Rutger A Vos; Claus Weiland
Journal:  J Biomed Semantics       Date:  2016-11-14

7.  Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building.

Authors:  Hong Cui; Dongfang Xu; Steven S Chong; Martin Ramirez; Thomas Rodenhausen; James A Macklin; Bertram Ludäscher; Robert A Morris; Eduardo M Soto; Nicolás Mongiardino Koch
Journal:  BMC Bioinformatics       Date:  2016-11-17       Impact factor: 3.169

Review 8.  The Human Phenotype Ontology in 2017.

Authors:  Sebastian Köhler; Nicole A Vasilevsky; Mark Engelstad; Erin Foster; Julie McMurry; Ségolène Aymé; Gareth Baynam; Susan M Bello; Cornelius F Boerkoel; Kym M Boycott; Michael Brudno; Orion J Buske; Patrick F Chinnery; Valentina Cipriani; Laureen E Connell; Hugh J S Dawkins; Laura E DeMare; Andrew D Devereau; Bert B A de Vries; Helen V Firth; Kathleen Freson; Daniel Greene; Ada Hamosh; Ingo Helbig; Courtney Hum; Johanna A Jähn; Roger James; Roland Krause; Stanley J F Laulederkind; Hanns Lochmüller; Gholson J Lyon; Soichi Ogishima; Annie Olry; Willem H Ouwehand; Nikolas Pontikos; Ana Rath; Franz Schaefer; Richard H Scott; Michael Segal; Panagiotis I Sergouniotis; Richard Sever; Cynthia L Smith; Volker Straub; Rachel Thompson; Catherine Turner; Ernest Turro; Marijcke W M Veltman; Tom Vulliamy; Jing Yu; Julie von Ziegenweidt; Andreas Zankl; Stephan Züchner; Tomasz Zemojtel; Julius O B Jacobsen; Tudor Groza; Damian Smedley; Christopher J Mungall; Melissa Haendel; Peter N Robinson
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  Extraction of phenotypic traits from taxonomic descriptions for the tree of life using natural language processing.

Authors:  Lorena Endara; Hong Cui; J Gordon Burleigh
Journal:  Appl Plant Sci       Date:  2018-03-31       Impact factor: 1.936

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.