Literature DB >> 15308538

Gene clustering by latent semantic indexing of MEDLINE abstracts.

Ramin Homayouni1, Kevin Heinrich, Lai Wei, Michael W Berry.   

Abstract

MOTIVATION: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations.
RESULTS: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. AVAILABILITY: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.

Mesh:

Substances:

Year:  2004        PMID: 15308538     DOI: 10.1093/bioinformatics/bth464

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  56 in total

1.  A document clustering and ranking system for exploring MEDLINE citations.

Authors:  Yongjing Lin; Wenyuan Li; Keke Chen; Ying Liu
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

Review 2.  Empirical distributional semantics: methods and biomedical applications.

Authors:  Trevor Cohen; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-02-14       Impact factor: 6.317

3.  Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts.

Authors:  Jianji Yang; Aaron M Cohen; William Hersh
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

4.  Simulating expert clinical comprehension: adapting latent semantic analysis to accurately extract clinical concepts from psychiatric narrative.

Authors:  Trevor Cohen; Brett Blatter; Vimla Patel
Journal:  J Biomed Inform       Date:  2008-03-27       Impact factor: 6.317

5.  Detection, validation, and downstream analysis of allelic variation in gene expression.

Authors:  Daniel C Ciobanu; Lu Lu; Khyobeni Mozhui; Xusheng Wang; Manjunatha Jagalur; John A Morris; William L Taylor; Klaus Dietz; Perikles Simon; Robert W Williams
Journal:  Genetics       Date:  2009-11-02       Impact factor: 4.562

6.  [Attenuation regulation of amino acid and amino acyl-tRNA biosynthetic operons in bacteria: comparative genomics analysis].

Authors:  K V Lopatovskaia; A V Seliverstov; V A Liubetskiĭ
Journal:  Mol Biol (Mosk)       Date:  2010 Jan-Feb

7.  Annotating the human genome with Disease Ontology.

Authors:  John D Osborne; Jared Flatow; Michelle Holko; Simon M Lin; Warren A Kibbe; Lihua Julie Zhu; Maria I Danila; Gang Feng; Rex L Chisholm
Journal:  BMC Genomics       Date:  2009-07-07       Impact factor: 3.969

8.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

Authors:  Xin He; Moushumi Sen Sarma; Xu Ling; Brant Chee; Chengxiang Zhai; Bruce Schatz
Journal:  BMC Bioinformatics       Date:  2010-05-20       Impact factor: 3.169

9.  Gene expression in the mouse eye: an online resource for genetics using 103 strains of mice.

Authors:  Eldon E Geisert; Lu Lu; Natalie E Freeman-Anderson; Justin P Templeton; Mohamed Nassr; Xusheng Wang; Weikuan Gu; Yan Jiao; Robert W Williams
Journal:  Mol Vis       Date:  2009-08-31       Impact factor: 2.367

10.  SENT: semantic features in text.

Authors:  Miguel Vazquez; Pedro Carmona-Saez; Ruben Nogales-Cadenas; Monica Chagoyen; Francisco Tirado; Jose Maria Carazo; Alberto Pascual-Montano
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.