Literature DB >> 12421757

Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons.

Alvaro Mateos1, Joaquín Dopazo, Ronald Jansen, Yuhai Tu, Mark Gerstein, Gustavo Stolovitzky.   

Abstract

Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for ~100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only ~10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12421757      PMCID: PMC187551          DOI: 10.1101/gr.192502

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  24 in total

1.  Clustering gene expression patterns.

Authors:  A Ben-Dor; R Shamir; Z Yakhini
Journal:  J Comput Biol       Date:  1999 Fall-Winter       Impact factor: 1.479

Review 2.  Exploring expression data: identification and analysis of coexpressed genes.

Authors:  L J Heyer; S Kruglyak; S Yooseph
Journal:  Genome Res       Date:  1999-11       Impact factor: 9.043

3.  Systematic determination of genetic network architecture.

Authors:  S Tavazoie; J D Hughes; M J Campbell; R J Cho; G M Church
Journal:  Nat Genet       Date:  1999-07       Impact factor: 38.330

4.  Knowledge-based analysis of microarray gene expression data by using support vector machines.

Authors:  M P Brown; W N Grundy; D Lin; N Cristianini; C W Sugnet; T S Furey; M Ares; D Haussler
Journal:  Proc Natl Acad Sci U S A       Date:  2000-01-04       Impact factor: 11.205

5.  Singular value decomposition for genome-wide expression data processing and modeling.

Authors:  O Alter; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  2000-08-29       Impact factor: 11.205

6.  Analysis of gene expression microarrays for phenotype classification.

Authors:  A Califano; G Stolovitzky; Y Tu
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

7.  The perceptron: a probabilistic model for information storage and organization in the brain.

Authors:  F ROSENBLATT
Journal:  Psychol Rev       Date:  1958-11       Impact factor: 8.934

8.  MIPS: a database for genomes and protein sequences.

Authors:  H W Mewes; D Frishman; C Gruber; B Geier; D Haase; A Kaps; K Lemcke; G Mannhaupt; F Pfeiffer; C Schüller; S Stocker; B Weil
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

9.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers.

Authors:  C M Perou; S S Jeffrey; M van de Rijn; C A Rees; M B Eisen; D T Ross; A Pergamenschikov; C F Williams; S X Zhu; J C Lee; D Lashkari; D Shalon; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1999-08-03       Impact factor: 11.205

10.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.

Authors:  A A Alizadeh; M B Eisen; R E Davis; C Ma; I S Lossos; A Rosenwald; J C Boldrick; H Sabet; T Tran; X Yu; J I Powell; L Yang; G E Marti; T Moore; J Hudson; L Lu; D B Lewis; R Tibshirani; G Sherlock; W C Chan; T C Greiner; D D Weisenburger; J O Armitage; R Warnke; R Levy; W Wilson; M R Grever; J C Byrd; D Botstein; P O Brown; L M Staudt
Journal:  Nature       Date:  2000-02-03       Impact factor: 49.962

View more
  32 in total

1.  A gene recommender algorithm to identify coexpressed genes in C. elegans.

Authors:  Art B Owen; Josh Stuart; Kathy Mach; Anne M Villeneuve; Stuart Kim
Journal:  Genome Res       Date:  2003-08       Impact factor: 9.043

2.  Lineage specificity of gene expression patterns.

Authors:  Yuval Kluger; David P Tuck; Joseph T Chang; Yasuhiro Nakayama; Ranjana Poddar; Naohiko Kohya; Zheng Lian; Abdelhakim Ben Nasr; H Ruth Halaban; Diane S Krause; Xueqing Zhang; Peter E Newburger; Sherman M Weissman
Journal:  Proc Natl Acad Sci U S A       Date:  2004-04-19       Impact factor: 11.205

Review 3.  Bioinformatics and cancer: an essential alliance.

Authors:  Joaquín Dopazo
Journal:  Clin Transl Oncol       Date:  2006-06       Impact factor: 3.405

4.  Template-driven gene selection procedure.

Authors:  N Knowlton; I Dozmorov; K D Kyker; R Saban; C Cadwell; M B Centola; R E Hurst
Journal:  Syst Biol (Stevenage)       Date:  2006-01

5.  The Helitron family classification using SVM based on Fourier transform features applied on an unbalanced dataset.

Authors:  Rabeb Touati; Afef Elloumi Oueslati; Imen Messaoudi; Zied Lachiri
Journal:  Med Biol Eng Comput       Date:  2019-08-17       Impact factor: 2.602

6.  Spectral biclustering of microarray data: coclustering genes and conditions.

Authors:  Yuval Kluger; Ronen Basri; Joseph T Chang; Mark Gerstein
Journal:  Genome Res       Date:  2003-04       Impact factor: 9.043

7.  Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph.

Authors:  Adam J Richards; Brian Muller; Matthew Shotwell; L Ashley Cowart; Bäerbel Rohrer; Xinghua Lu
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

8.  Semi-supervised clustering methods.

Authors:  Eric Bair
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2013

9.  PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data.

Authors:  Laurent Bréhélin; Jean-François Dufayard; Olivier Gascuel
Journal:  BMC Bioinformatics       Date:  2008-10-16       Impact factor: 3.169

10.  Gene set internal coherence in the context of functional profiling.

Authors:  David Montaner; Pablo Minguez; Fátima Al-Shahrour; Joaquín Dopazo
Journal:  BMC Genomics       Date:  2009-04-27       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.