Literature DB >> 12888516

The computational analysis of scientific literature to define and recognize gene expression clusters.

Soumya Raychaudhuri1, Jeffrey T Chang, Farhad Imam, Russ B Altman.   

Abstract

A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.

Entities:  

Mesh:

Substances:

Year:  2003        PMID: 12888516      PMCID: PMC169898          DOI: 10.1093/nar/gkg636

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  29 in total

1.  Textquest: document clustering of Medline abstracts for concept discovery in molecular biology.

Authors:  I Iliopoulos; A J Enright; C A Ouzounis
Journal:  Pac Symp Biocomput       Date:  2001

Review 2.  Analysis of large-scale gene expression data.

Authors:  G Sherlock
Journal:  Curr Opin Immunol       Date:  2000-04       Impact factor: 7.486

3.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

4.  Genome-wide study of aging and oxidative stress response in Drosophila melanogaster.

Authors:  S Zou; S Meadows; L Sharp; L Y Jan; Y N Jan
Journal:  Proc Natl Acad Sci U S A       Date:  2000-12-05       Impact factor: 11.205

5.  Functional discovery via a compendium of expression profiles.

Authors:  T R Hughes; M J Marton; A R Jones; C J Roberts; R Stoughton; C D Armour; H A Bennett; E Coffey; H Dai; Y D He; M J Kidd; A M King; M R Meyer; D Slade; P Y Lum; S B Stepaniants; D D Shoemaker; D Gachotte; K Chakraburtty; J Simon; M Bard; S H Friend
Journal:  Cell       Date:  2000-07-07       Impact factor: 41.582

6.  Genes, themes and microarrays: using information retrieval for large-scale gene analysis.

Authors:  H Shatkay; S Edwards; W J Wilbur; M Boguski
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

7.  Restricted expression and subnuclear localization of the Drosophila gene Dnop5, a member of the Nop/Sik family of the conserved rRNA processing factors.

Authors:  G Vorbrüggen; S Onel; H Jäckle
Journal:  Mech Dev       Date:  2000-02       Impact factor: 1.882

8.  Molecular classification of cutaneous malignant melanoma by gene expression profiling.

Authors:  M Bittner; P Meltzer; Y Chen; Y Jiang; E Seftor; M Hendrix; M Radmacher; R Simon; Z Yakhini; A Ben-Dor; N Sampas; E Dougherty; E Wang; F Marincola; C Gooden; J Lueders; A Glatfelter; P Pollock; J Carpten; E Gillanders; D Leja; K Dietrich; C Beaudry; M Berens; D Alberts; V Sondak
Journal:  Nature       Date:  2000-08-03       Impact factor: 49.962

9.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

Authors:  A A Alizadeh; M B Eisen; R E Davis; C Ma; I S Lossos; A Rosenwald; J C Boldrick; H Sabet; T Tran; X Yu; J I Powell; L Yang; G E Marti; T Moore; J Hudson; L Lu; D B Lewis; R Tibshirani; G Sherlock; W C Chan; T C Greiner; D D Weisenburger; J O Armitage; R Warnke; R Levy; W Wilson; M R Grever; J C Byrd; D Botstein; P O Brown; L M Staudt
Journal:  Nature       Date:  2000-02-03       Impact factor: 49.962

10.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

View more
  17 in total

Review 1.  The impact of the NIH public access policy on literature informatics: What role can the neuroinformaticists play?

Authors:  William Bug
Journal:  Neuroinformatics       Date:  2005

2.  Incorporating Ontology-Driven Similarity Knowledge into Functional Genomics: An Exploratory Study.

Authors:  Francisco Azuaje; Olivier Bodenreider
Journal:  BIBE 2004       Date:  2004-05

3.  Identification of recurring protein structure microenvironments and discovery of novel functional sites around CYS residues.

Authors:  Shirley Wu; Tianyun Liu; Russ B Altman
Journal:  BMC Struct Biol       Date:  2010-02-02

4.  Literature mining for the discovery of hidden connections between drugs, genes and diseases.

Authors:  Raoul Frijters; Marianne van Vugt; Ruben Smeets; René van Schaik; Jacob de Vlieg; Wynand Alkema
Journal:  PLoS Comput Biol       Date:  2010-09-23       Impact factor: 4.475

5.  The Text-mining based PubChem Bioassay neighboring analysis.

Authors:  Lianyi Han; Tugba O Suzek; Yanli Wang; Steve H Bryant
Journal:  BMC Bioinformatics       Date:  2010-11-08       Impact factor: 3.169

6.  CoPub Mapper: mining MEDLINE based on search term co-publication.

Authors:  Blaise T F Alako; Antoine Veldhoven; Sjozef van Baal; Rob Jelier; Stefan Verhoeven; Ton Rullmann; Jan Polman; Guido Jenster
Journal:  BMC Bioinformatics       Date:  2005-03-11       Impact factor: 3.169

7.  Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

Authors:  Rob Jelier; Guido Jenster; Lambert C J Dorssers; Bas J Wouters; Peter J M Hendriksen; Barend Mons; Ruud Delwel; Jan A Kors
Journal:  BMC Bioinformatics       Date:  2007-01-18       Impact factor: 3.169

Review 8.  Complexity in cancer biology: is systems biology the answer?

Authors:  Evangelia Koutsogiannouli; Athanasios G Papavassiliou; Nikolaos A Papanikolaou
Journal:  Cancer Med       Date:  2013-02-17       Impact factor: 4.452

9.  Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data.

Authors:  Michael J Gilchrist; Mikkel B Christensen; Richard Harland; Nicolas Pollet; James C Smith; Naoto Ueno; Nancy Papalopulu
Journal:  BMC Bioinformatics       Date:  2008-10-17       Impact factor: 3.169

10.  Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features.

Authors:  J R Semeiks; A Rizki; M J Bissell; I S Mian
Journal:  BMC Bioinformatics       Date:  2006-03-16       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.