Literature DB >> 11983868

Selection bias in gene extraction on the basis of microarray gene-expression data.

Christophe Ambroise1, Geoffrey J McLachlan.   

Abstract

In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes.

Entities:  

Mesh:

Year:  2002        PMID: 11983868      PMCID: PMC124442          DOI: 10.1073/pnas.102102699

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  16 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

2.  Knowledge-based analysis of microarray gene expression data by using support vector machines.

Authors:  M P Brown; W N Grundy; D Lin; N Cristianini; C W Sugnet; T S Furey; M Ares; D Haussler
Journal:  Proc Natl Acad Sci U S A       Date:  2000-01-04       Impact factor: 11.205

3.  Identifying marker genes in transcription profiling data using a mixture of feature relevance experts.

Authors:  M L Chow; E J Moler; I S Mian
Journal:  Physiol Genomics       Date:  2001-03-08       Impact factor: 3.107

4.  Feature (gene) selection in gene expression-based tumor classification.

Authors:  M Xiong; W Li; J Zhao; L Jin; E Boerwinkle
Journal:  Mol Genet Metab       Date:  2001-07       Impact factor: 4.797

5.  Tissue classification with gene expression profiles.

Authors:  A Ben-Dor; L Bruhn; N Friedman; I Nachman; M Schummer; Z Yakhini
Journal:  J Comput Biol       Date:  2000       Impact factor: 1.479

6.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.

Authors:  T Sørlie; C M Perou; R Tibshirani; T Aas; S Geisler; H Johnsen; T Hastie; M B Eisen; M van de Rijn; S S Jeffrey; T Thorsen; H Quist; J C Matese; P O Brown; D Botstein; P E Lønning; A L Børresen-Dale
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-11       Impact factor: 11.205

7.  Predicting the clinical status of human breast cancer by using gene expression profiles.

Authors:  M West; C Blanchette; H Dressman; E Huang; S Ishida; R Spang; H Zuzan; J A Olson; J R Marks; J R Nevins
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-18       Impact factor: 11.205

8.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

9.  Recursive partitioning for tumor classification with gene expression microarray data.

Authors:  H Zhang; C Y Yu; B Singer; M Xiong
Journal:  Proc Natl Acad Sci U S A       Date:  2001-05-29       Impact factor: 11.205

10.  Analysis of molecular profile data using generative and discriminative methods.

Authors:  E J Moler; M L Chow; I S Mian
Journal:  Physiol Genomics       Date:  2000-12-18       Impact factor: 3.107

View more
  300 in total

1.  Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes.

Authors:  Jennifer Pittman; Erich Huang; Holly Dressman; Cheng-Fang Horng; Skye H Cheng; Mei-Hua Tsou; Chii-Ming Chen; Andrea Bild; Edwin S Iversen; Andrew T Huang; Joseph R Nevins; Mike West
Journal:  Proc Natl Acad Sci U S A       Date:  2004-05-19       Impact factor: 11.205

2.  New challenges in gene expression data analysis and the extended GEPAS.

Authors:  Javier Herrero; Juan M Vaquerizas; Fátima Al-Shahrour; Lucía Conde; Alvaro Mateos; Javier Santoyo Ramón Díaz-Uriarte; Joaquín Dopazo
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  A novel significance score for gene selection and ranking.

Authors:  Yufei Xiao; Tzu-Hung Hsiao; Uthra Suresh; Hung-I Harry Chen; Xiaowu Wu; Steven E Wolf; Yidong Chen
Journal:  Bioinformatics       Date:  2012-02-09       Impact factor: 6.937

4.  Defining the ischemic penumbra using magnetic resonance oxygen metabolic index.

Authors:  Hongyu An; Andria L Ford; Yasheng Chen; Hongtu Zhu; Rosana Ponisio; Gyanendra Kumar; Amirali Modir Shanechi; Naim Khoury; Katie D Vo; Jennifer Williams; Colin P Derdeyn; Michael N Diringer; Peter Panagos; William J Powers; Jin-Moo Lee; Weili Lin
Journal:  Stroke       Date:  2015-02-26       Impact factor: 7.914

5.  Selective voting in convex-hull ensembles improves classification accuracy.

Authors:  Ralph L Kodell; Chuanlei Zhang; Eric R Siegel; Radhakrishnan Nagarajan
Journal:  Artif Intell Med       Date:  2011-11-06       Impact factor: 5.326

6.  Time-resolved transcriptome and proteome landscape of human regulatory T cell (Treg) differentiation reveals novel regulators of FOXP3.

Authors:  Angelika Schmidt; Francesco Marabita; Narsis A Kiani; Catharina C Gross; Henrik J Johansson; Szabolcs Éliás; Sini Rautio; Matilda Eriksson; Sunjay Jude Fernandes; Gilad Silberberg; Ubaid Ullah; Urvashi Bhatia; Harri Lähdesmäki; Janne Lehtiö; David Gomez-Cabrero; Heinz Wiendl; Riitta Lahesmaa; Jesper Tegnér
Journal:  BMC Biol       Date:  2018-05-07       Impact factor: 7.431

7.  Dynamic metabolomic data analysis: a tutorial review.

Authors:  A K Smilde; J A Westerhuis; H C J Hoefsloot; S Bijlsma; C M Rubingh; D J Vis; R H Jellema; H Pijl; F Roelfsema; J van der Greef
Journal:  Metabolomics       Date:  2009-12-04       Impact factor: 4.290

8.  DeepCOMBI: explainable artificial intelligence for the analysis and discovery in genome-wide association studies.

Authors:  Bettina Mieth; Alexandre Rozier; Juan Antonio Rodriguez; Marina M C Höhne; Nico Görnitz; Klaus-Robert Müller
Journal:  NAR Genom Bioinform       Date:  2021-07-20

9.  Prediction of 3-year clinical course in CADASIL.

Authors:  Eric Jouvent; Edouard Duchesnay; Foued Hadj-Selem; François De Guio; Jean-François Mangin; Dominique Hervé; Marco Duering; Stefan Ropele; Reinhold Schmidt; Martin Dichgans; Hugues Chabriat
Journal:  Neurology       Date:  2016-09-30       Impact factor: 9.910

10.  A data mining methodology for predicting early stage Parkinson's disease using non-invasive, high-dimensional gait sensor data.

Authors:  Conrad Tucker; Yixiang Han; Harriet Black Nembhard; Mechelle Lewis; Wang-Chien Lee; Nicholas W Sterling; Xuemei Huang
Journal:  IIE Trans Healthc Syst Eng       Date:  2015-11-20
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.