Literature DB >> 18645624

Validation of computational methods in genomics.

Edward R Doughtery1, Hua Jianping, Michael L Bittner.   

Abstract

High-throughput technologies for genomics provide tens of thousands of genetic measurements, for instance, gene-expression measurements on microarrays, and the availability of these measurements has motivated the use of machine learning (inference) methods for classification, clustering, and gene networks. Generally, a design method will yield a model that satisfies some model constraints and fits the data in some manner. On the other hand, a scientific theory consists of two parts: (1) a mathematical model to characterize relations between variables, and (2) a set of relations between model variables and observables that are used to validate the model via predictive experiments. Although machine learning algorithms are constructed to hopefully produce valid scientific models, they do not ipso facto do so. In some cases, such as classifier estimation, there is a well-developed error theory that relates to model validity according to various statistical theorems, but in others such as clustering, there is a lack of understanding of the relationship between the learning algorithms and validation. The issue of validation is especially problematic in situations where the sample size is small in comparison with the dimensionality (number of variables), which is commonplace in genomics, because the convergence theory of learning algorithms is typically asymptotic and the algorithms often perform in counter-intuitive ways when used with samples that are small in relation to the number of variables. For translational genomics, validation is perhaps the most critical issue, because it is imperative that we understand the performance of a diagnostic or therapeutic procedure to be used in the clinic, and this performance relates directly to the validity of the model behind the procedure. This paper treats the validation issue as it appears in two classes of inference algorithms relating to genomics - classification and clustering. It formulates the problem and reviews salient results.

Entities:  

Year:  2007        PMID: 18645624      PMCID: PMC2474684          DOI: 10.2174/138920207780076956

Source DB:  PubMed          Journal:  Curr Genomics        ISSN: 1389-2029            Impact factor:   2.236


  32 in total

1.  Clustering gene expression patterns.

Authors:  A Ben-Dor; R Shamir; Z Yakhini
Journal:  J Comput Biol       Date:  1999 Fall-Winter       Impact factor: 1.479

2.  [Study of somatic chromosomes from 9 mongoloid children].

Authors:  J LEJEUNE; M GAUTIER; R TURPIN
Journal:  C R Hebd Seances Acad Sci       Date:  1959-03-16

3.  Prediction error estimation: a comparison of resampling methods.

Authors:  Annette M Molinaro; Richard Simon; Ruth M Pfeiffer
Journal:  Bioinformatics       Date:  2005-05-19       Impact factor: 6.937

4.  Genetic test bed for feature selection.

Authors:  Ashish Choudhary; Marcel Brun; Jianping Hua; James Lowey; Ed Suh; Edward R Dougherty
Journal:  Bioinformatics       Date:  2006-01-20       Impact factor: 6.937

5.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

6.  Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns.

Authors:  S Gruvberger; M Ringnér; Y Chen; S Panavally; L H Saal; M Fernö; C Peterson; P S Meltzer
Journal:  Cancer Res       Date:  2001-08-15       Impact factor: 12.701

7.  Molecular portraits of human breast tumours.

Authors:  C M Perou; T Sørlie; M B Eisen; M van de Rijn; S S Jeffrey; C A Rees; J R Pollack; D T Ross; H Johnsen; L A Akslen; O Fluge; A Pergamenschikov; C Williams; S X Zhu; P E Lønning; A L Børresen-Dale; P O Brown; D Botstein
Journal:  Nature       Date:  2000-08-17       Impact factor: 49.962

8.  Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response.

Authors:  Stefano Monti; Kerry J Savage; Jeffery L Kutok; Friedrich Feuerhake; Paul Kurtin; Martin Mihm; Bingyan Wu; Laura Pasqualucci; Donna Neuberg; Ricardo C T Aguiar; Paola Dal Cin; Christine Ladd; Geraldine S Pinkus; Gilles Salles; Nancy Lee Harris; Riccardo Dalla-Favera; Thomas M Habermann; Jon C Aster; Todd R Golub; Margaret A Shipp
Journal:  Blood       Date:  2004-11-18       Impact factor: 22.113

9.  Microarray reveals differences in both tumors and vascular specific gene expression in de novo CD5+ and CD5- diffuse large B-cell lymphomas.

Authors:  Tohru Kobayashi; Motoko Yamaguchi; Seungchan Kim; Jun Morikawa; Shoko Ogawa; Satoshi Ueno; Edward Suh; Edward Dougherty; Ilya Shmulevich; Hiroshi Shiku; Wei Zhang
Journal:  Cancer Res       Date:  2003-01-01       Impact factor: 12.701

10.  Progressive disruption of cellular protein folding in models of polyglutamine diseases.

Authors:  Tali Gidalevitz; Anat Ben-Zvi; Kim H Ho; Heather R Brignull; Richard I Morimoto
Journal:  Science       Date:  2006-02-09       Impact factor: 63.714

View more
  15 in total

1.  Decorrelation of the true and estimated classifier errors in high-dimensional settings.

Authors:  Blaise Hanczar; Jianping Hua; Edward R Dougherty
Journal:  EURASIP J Bioinform Syst Biol       Date:  2007

2.  Multiple-rule bias in the comparison of classification rules.

Authors:  Mohammadmahdi R Yousefi; Jianping Hua; Edward R Dougherty
Journal:  Bioinformatics       Date:  2011-05-05       Impact factor: 6.937

3.  The α₂β₁ integrin is a metastasis suppressor in mouse models and human cancer.

Authors:  Norma E Ramirez; Zhonghua Zhang; Aasakiran Madamanchi; Kelli L Boyd; Lynda D O'Rear; Abudi Nashabi; Zhengzi Li; William D Dupont; Andries Zijlstra; Mary M Zutter
Journal:  J Clin Invest       Date:  2010-12-06       Impact factor: 14.808

4.  On Kolmogorov Asymptotics of Estimators of the Misclassification Error Rate in Linear Discriminant Analysis.

Authors:  Amin Zollanvari; Marc G Genton
Journal:  Sankhya Ser A       Date:  2013-08-01

5.  Genes related to suppression of malignant phenotype induced by Maitake D-Fraction in breast cancer cells.

Authors:  Eliana Noelia Alonso; Manuela Orozco; Alvaro Eloy Nieto; Gabriela Andrea Balogh
Journal:  J Med Food       Date:  2013-07       Impact factor: 2.786

Review 6.  A review of the applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles.

Authors:  David E Jones; Hamidreza Ghandehari; Julio C Facelli
Journal:  Comput Methods Programs Biomed       Date:  2016-04-28       Impact factor: 5.428

7.  A statistical approach to selecting and confirming validation targets in -omics experiments.

Authors:  Jeffrey T Leek; Margaret A Taub; Jason L Rasgon
Journal:  BMC Bioinformatics       Date:  2012-06-27       Impact factor: 3.169

8.  Regarding: Shung et al: Validation of a Machine Learning Model That Outperforms Clinical Risk Scoring Systems for Upper Gastrointestinal Bleeding.

Authors:  Hyun-Seok Kim; Frederick B Peng; Juan David Gomez Cifuentes
Journal:  Gastroenterology       Date:  2020-03-19       Impact factor: 22.682

9.  Translational science: epistemology and the investigative process.

Authors:  Edward R Dougherty
Journal:  Curr Genomics       Date:  2009-04       Impact factor: 2.236

10.  On the epistemological crisis in genomics.

Authors:  Edward R Dougherty
Journal:  Curr Genomics       Date:  2008-04       Impact factor: 2.236

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.