Ramin Homayouni1, Kevin Heinrich, Lai Wei, Michael W Berry. 1. Department of Neurology, University of Tennessee Health Science Center, 855 Monroe Avenue, 416 Link Bldg, Memphis, TN 38163, USA. rhomayouni@utmem.edu
Abstract
MOTIVATION: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. RESULTS: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. AVAILABILITY: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.
MOTIVATION: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations. RESULTS: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments. AVAILABILITY: The 50-gene document collection used in this study can be interactively queried at http://shad.cs.utk.edu/sgo/sgo.html.
Authors: Daniel C Ciobanu; Lu Lu; Khyobeni Mozhui; Xusheng Wang; Manjunatha Jagalur; John A Morris; William L Taylor; Klaus Dietz; Perikles Simon; Robert W Williams Journal: Genetics Date: 2009-11-02 Impact factor: 4.562
Authors: John D Osborne; Jared Flatow; Michelle Holko; Simon M Lin; Warren A Kibbe; Lihua Julie Zhu; Maria I Danila; Gang Feng; Rex L Chisholm Journal: BMC Genomics Date: 2009-07-07 Impact factor: 3.969
Authors: Eldon E Geisert; Lu Lu; Natalie E Freeman-Anderson; Justin P Templeton; Mohamed Nassr; Xusheng Wang; Weikuan Gu; Yan Jiao; Robert W Williams Journal: Mol Vis Date: 2009-08-31 Impact factor: 2.367
Authors: Miguel Vazquez; Pedro Carmona-Saez; Ruben Nogales-Cadenas; Monica Chagoyen; Francisco Tirado; Jose Maria Carazo; Alberto Pascual-Montano Journal: Nucleic Acids Res Date: 2009-05-20 Impact factor: 16.971