Literature DB >> 12368251

Using text analysis to identify functionally coherent gene groups.

Soumya Raychaudhuri1, Hinrich Schütze, Russ B Altman.   

Abstract

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how "functionally coherent" the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.

Mesh:

Year:  2002        PMID: 12368251      PMCID: PMC187532          DOI: 10.1101/gr.116402

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  30 in total

1.  A literature network of human genes for high-throughput analysis of gene expression.

Authors:  T K Jenssen; A Laegreid; J Komorowski; E Hovig
Journal:  Nat Genet       Date:  2001-05       Impact factor: 38.330

2.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature.

Authors:  Soumya Raychaudhuri; Jeffrey T Chang; Patrick D Sutphin; Russ B Altman
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

3.  Predicting the sub-cellular location of proteins from text using support vector machines.

Authors:  B J Stapley; L A Kelley; M J E Sternberg
Journal:  Pac Symp Biocomput       Date:  2002

4.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

5.  The Mouse Genome Database (MGD): the model organism database for the laboratory mouse.

Authors:  Judith A Blake; Joel E Richardson; Carol J Bult; Jim A Kadin; Janan T Eppig
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

6.  Automatic annotation for biological sequences by extraction of keywords from MEDLINE abstracts. Development of a prototype system.

Authors:  M A Andrade; A Valencia
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1997

7.  FlyBase: a Drosophila database. The FlyBase consortium.

Authors:  W M Gelbart; M Crosby; B Matthews; W P Rindone; J Chillemi; S Russo Twombly; D Emmert; M Ashburner; R A Drysdale; E Whitfield; G H Millburn; A de Grey; T Kaufman; K Matthews; D Gilbert; V Strelets; C Tolstoshev
Journal:  Nucleic Acids Res       Date:  1997-01-01       Impact factor: 16.971

8.  Structural and functional analyses of APG5, a gene involved in autophagy in yeast.

Authors:  S Kametaka; A Matsuura; Y Wada; Y Ohsumi
Journal:  Gene       Date:  1996-10-31       Impact factor: 3.688

9.  Resolution of subunit interactions and cytoplasmic subcomplexes of the yeast vacuolar proton-translocating ATPase.

Authors:  J J Tomashek; J L Sonnenburg; J M Artimovich; D J Klionsky
Journal:  J Biol Chem       Date:  1996-04-26       Impact factor: 5.157

10.  SGD: Saccharomyces Genome Database.

Authors:  J M Cherry; C Adler; C Ball; S A Chervitz; S S Dwight; E T Hester; Y Jia; G Juvik; T Roe; M Schroeder; S Weng; D Botstein
Journal:  Nucleic Acids Res       Date:  1998-01-01       Impact factor: 16.971

View more
  21 in total

1.  A literature-based method for assessing the functional coherence of a gene group.

Authors:  Soumya Raychaudhuri; Russ B Altman
Journal:  Bioinformatics       Date:  2003-02-12       Impact factor: 6.937

2.  A method for finding communities of related genes.

Authors:  Dennis M Wilkinson; Bernardo A Huberman
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-02       Impact factor: 11.205

3.  The computational analysis of scientific literature to define and recognize gene expression clusters.

Authors:  Soumya Raychaudhuri; Jeffrey T Chang; Farhad Imam; Russ B Altman
Journal:  Nucleic Acids Res       Date:  2003-08-01       Impact factor: 16.971

4.  New challenges in gene expression data analysis and the extended GEPAS.

Authors:  Javier Herrero; Juan M Vaquerizas; Fátima Al-Shahrour; Lucía Conde; Alvaro Mateos; Javier Santoyo Ramón Díaz-Uriarte; Joaquín Dopazo
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  Whole-genome annotation by using evidence integration in functional-linkage networks.

Authors:  Ulas Karaoz; T M Murali; Stan Letovsky; Yu Zheng; Chunming Ding; Charles R Cantor; Simon Kasif
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-23       Impact factor: 11.205

6.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

Authors:  Xin He; Moushumi Sen Sarma; Xu Ling; Brant Chee; Chengxiang Zhai; Bruce Schatz
Journal:  BMC Bioinformatics       Date:  2010-05-20       Impact factor: 3.169

7.  Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues.

Authors:  Shirley Wu; Tianyun Liu; Russ B Altman
Journal:  BMC Struct Biol       Date:  2010-02-02

8.  Literature mining for the discovery of hidden connections between drugs, genes and diseases.

Authors:  Raoul Frijters; Marianne van Vugt; Ruben Smeets; René van Schaik; Jacob de Vlieg; Wynand Alkema
Journal:  PLoS Comput Biol       Date:  2010-09-23       Impact factor: 4.475

9.  SENT: semantic features in text.

Authors:  Miguel Vazquez; Pedro Carmona-Saez; Ruben Nogales-Cadenas; Monica Chagoyen; Francisco Tirado; Jose Maria Carazo; Alberto Pascual-Montano
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

10.  A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae).

Authors:  Olga G Troyanskaya; Kara Dolinski; Art B Owen; Russ B Altman; David Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2003-06-25       Impact factor: 12.779

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.