Literature DB >> 30576485

Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Wasila Dahdul1, Prashanti Manda2, Hong Cui3, James P Balhoff4, T Alexander Dececchi1,5, Nizar Ibrahim6,7, Hilmar Lapp8, Todd Vision4, Paula M Mabee1.   

Abstract

Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.

Entities:  

Mesh:

Year:  2018        PMID: 30576485      PMCID: PMC6301375          DOI: 10.1093/database/bay110

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


  39 in total

1.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

Authors:  David Campos; Sérgio Matos; Ian Lewin; José Luís Oliveira; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2012-03-13       Impact factor: 6.937

3.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors:  Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal:  Nat Biotechnol       Date:  2007-11       Impact factor: 54.908

4.  Phenotype ontologies: the bridge between genomics and evolution.

Authors:  Paula M Mabee; Michael Ashburner; Quentin Cronk; Georgios V Gkoutos; Melissa Haendel; Erik Segerdell; Chris Mungall; Monte Westerfield
Journal:  Trends Ecol Evol       Date:  2007-04-09       Impact factor: 17.712

5.  Integration of morphological data sets for phylogenetic analysis of Amniota: the importance of integumentary characters and increased taxonomic sampling.

Authors:  Robert V Hill
Journal:  Syst Biol       Date:  2005-08       Impact factor: 15.683

6.  Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature.

Authors:  Wasila M Dahdul; James P Balhoff; Jeffrey Engeman; Terry Grande; Eric J Hilton; Cartik Kothari; Hilmar Lapp; John G Lundberg; Peter E Midford; Todd J Vision; Monte Westerfield; Paula M Mabee
Journal:  PLoS One       Date:  2010-05-20       Impact factor: 3.240

7.  ZFIN: enhancements and updates to the Zebrafish Model Organism Database.

Authors:  Yvonne Bradford; Tom Conlin; Nathan Dunn; David Fashena; Ken Frazer; Douglas G Howe; Jonathan Knight; Prita Mani; Ryan Martin; Sierra A T Moxon; Holly Paddock; Christian Pich; Sridhar Ramachandran; Barbara J Ruef; Leyla Ruzicka; Holle Bauer Schaper; Kevin Schaper; Xiang Shao; Amy Singer; Judy Sprague; Brock Sprunger; Ceri Van Slyke; Monte Westerfield
Journal:  Nucleic Acids Res       Date:  2010-10-29       Impact factor: 16.971

8.  Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy.

Authors:  Wasila Dahdul; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Paula Mabee
Journal:  Database (Oxford)       Date:  2015-05-13       Impact factor: 3.451

9.  Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Authors:  T Alexander Dececchi; James P Balhoff; Hilmar Lapp; Paula M Mabee
Journal:  Syst Biol       Date:  2015-05-26       Impact factor: 15.683

10.  Ontologies for the description of mouse phenotypes.

Authors:  G V Gkoutos; E C J Green; A-M Mallon; A Blake; S Greenaway; J M Hancock; D Davidson
Journal:  Comp Funct Genomics       Date:  2004
View more
  3 in total

1.  Craniodental and Postcranial Characters of Non-Avian Dinosauria Often Imply Different Trees.

Authors:  Yimeng Li; Marcello Ruta; Matthew A Wills
Journal:  Syst Biol       Date:  2020-07-01       Impact factor: 15.683

2.  Which methods are the most effective in enabling novice users to participate in ontology creation? A usability study.

Authors:  Limin Zhang; Xingyi Yang; Zuleima Cota; Hong Cui; Bruce Ford; Hsin-Liang Chen; James A Macklin; Anton Reznicek; Julian Starr
Journal:  Database (Oxford)       Date:  2021-06-22       Impact factor: 3.451

3.  Authors' attitude toward adopting a new workflow to improve the computability of phenotype publications.

Authors:  Hong Cui; Bruce Ford; Julian Starr; Anton Reznicek; Limin Zhang; James A Macklin
Journal:  Database (Oxford)       Date:  2022-02-02       Impact factor: 4.462

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.