Literature DB >> 19024494

Word Sense Disambiguation in biomedical ontologies with term co-occurrence analysis and document clustering.

Bill Andreopoulos1, Dimitra Alexopoulou, Michael Schroeder.   

Abstract

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as 'development' can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an F-measure of 77%. Additionally, applying document clustering improves precision to 82%. We applied the same approach to disambiguate 'nucleus', 'transport', and 'spindle', and we achieved consistent results. Thus, our method is a viable approach towards the automation of literature-based genome annotation.

Entities:  

Mesh:

Year:  2008        PMID: 19024494     DOI: 10.1504/ijdmb.2008.020522

Source DB:  PubMed          Journal:  Int J Data Min Bioinform        ISSN: 1748-5673            Impact factor:   0.667


  4 in total

1.  Quantitative analysis of ontology research articles in the radiologic domain.

Authors:  Naoki Nishimoto; Ayako Yagahara; Yuki Yokooka; Shintaro Tsuji; Masahito Uesugi; Katsuhiko Ogasawara; Masaji Maezawa
Journal:  Radiol Phys Technol       Date:  2010-05-22

2.  Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.

Authors:  Wei-Qi Wei; Pedro L Teixeira; Huan Mo; Robert M Cronin; Jeremy L Warner; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2015-09-02       Impact factor: 4.497

3.  Answering biological questions: querying a systems biology database for nutrigenomics.

Authors:  Chris T Evelo; Kees van Bochove; Jahn-Takeshi Saito
Journal:  Genes Nutr       Date:  2010-10-30       Impact factor: 5.523

4.  Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy.

Authors:  Dimitra Alexopoulou; Bill Andreopoulos; Heiko Dietze; Andreas Doms; Fabien Gandon; Jörg Hakenberg; Khaled Khelif; Michael Schroeder; Thomas Wächter
Journal:  BMC Bioinformatics       Date:  2009-01-21       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.