Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Literature DB >> 30576485

Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.

Wasila Dahdul¹, Prashanti Manda², Hong Cui³, James P Balhoff⁴, T Alexander Dececchi^1,5, Nizar Ibrahim^6,7, Hilmar Lapp⁸, Todd Vision⁴, Paula M Mabee¹.

Abstract

Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30576485 PMCID： PMC6301375 DOI： 10.1093/database/bay110

Source DB: PubMed Journal: Database (Oxford) ISSN： 1758-0463 Impact factor: 3.451

39 in total

1. GENIA corpus--semantically annotated corpus for bio-textmining.

Authors: J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal: Bioinformatics Date: 2003 Impact factor: 6.937

2. Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

Authors: David Campos; Sérgio Matos; Ian Lewin; José Luís Oliveira; Dietrich Rebholz-Schuhmann
Journal: Bioinformatics Date: 2012-03-13 Impact factor: 6.937

3. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors: Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal: Nat Biotechnol Date: 2007-11 Impact factor: 54.908

4. Phenotype ontologies: the bridge between genomics and evolution.

Authors: Paula M Mabee; Michael Ashburner; Quentin Cronk; Georgios V Gkoutos; Melissa Haendel; Erik Segerdell; Chris Mungall; Monte Westerfield
Journal: Trends Ecol Evol Date: 2007-04-09 Impact factor: 17.712

5. Integration of morphological data sets for phylogenetic analysis of Amniota: the importance of integumentary characters and increased taxonomic sampling.

Authors: Robert V Hill
Journal: Syst Biol Date: 2005-08 Impact factor: 15.683

6. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature.

Authors: Wasila M Dahdul; James P Balhoff; Jeffrey Engeman; Terry Grande; Eric J Hilton; Cartik Kothari; Hilmar Lapp; John G Lundberg; Peter E Midford; Todd J Vision; Monte Westerfield; Paula M Mabee
Journal: PLoS One Date: 2010-05-20 Impact factor: 3.240

7. ZFIN: enhancements and updates to the Zebrafish Model Organism Database.

Authors: Yvonne Bradford; Tom Conlin; Nathan Dunn; David Fashena; Ken Frazer; Douglas G Howe; Jonathan Knight; Prita Mani; Ryan Martin; Sierra A T Moxon; Holly Paddock; Christian Pich; Sridhar Ramachandran; Barbara J Ruef; Leyla Ruzicka; Holle Bauer Schaper; Kevin Schaper; Xiang Shao; Amy Singer; Judy Sprague; Brock Sprunger; Ceri Van Slyke; Monte Westerfield
Journal: Nucleic Acids Res Date: 2010-10-29 Impact factor: 16.971

8. Moving the mountain: analysis of the effort required to transform comparative anatomy into computable anatomy.

Authors: Wasila Dahdul; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Paula Mabee
Journal: Database (Oxford) Date: 2015-05-13 Impact factor: 3.451

9. Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

Authors: T Alexander Dececchi; James P Balhoff; Hilmar Lapp; Paula M Mabee
Journal: Syst Biol Date: 2015-05-26 Impact factor: 15.683