Literature DB >> 21121020

Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules.

Mizanur R Khondoker1, Till T Bachmann, Muriel Mewissen, Paul Dickinson, Bartosz Dobrzelecki, Colin J Campbell, Andrew R Mount, Anthony J Walton, Jason Crain, Holger Schulze, Gerard Giraud, Alan J Ross, Ilenia Ciani, Stuart W J Ember, Chaker Tlili, Jonathan G Terry, Eilidh Grant, Nicola McDonnell, Peter Ghazal.   

Abstract

Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 21121020     DOI: 10.1142/s0219720010005063

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  8 in total

1.  Identification of differentially expressed genes in microarray data in a principal component space.

Authors:  Luis Ospina; Liliana López-Kleine
Journal:  Springerplus       Date:  2013-02-19

2.  Whole blood gene expression profiling of neonates with confirmed bacterial sepsis.

Authors:  Paul Dickinson; Claire L Smith; Thorsten Forster; Marie Craigon; Alan J Ross; Mizan R Khondoker; Alasdair Ivens; David J Lynn; Judith Orme; Allan Jackson; Paul Lacaze; Katie L Flanagan; Benjamin J Stenson; Peter Ghazal
Journal:  Genom Data       Date:  2014-11-15

3.  Prediction of cardiovascular outcomes with machine learning techniques: application to the Cardiovascular Outcomes in Renal Atherosclerotic Lesions (CORAL) study.

Authors:  Tian Chen; Pamela Brewster; Katherine R Tuttle; Lance D Dworkin; William Henrich; Barbara A Greco; Michael Steffes; Sheldon Tobe; Kenneth Jamerson; Karol Pencina; Joseph M Massaro; Ralph B D'Agostino; Donald E Cutlip; Timothy P Murphy; Christopher J Cooper; Joseph I Shapiro
Journal:  Int J Nephrol Renovasc Dis       Date:  2019-03-21

4.  DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling.

Authors:  F J Campos-Laborie; A Risueño; M Ortiz-Estévez; B Rosón-Burgo; C Droste; C Fontanillo; R Loos; J M Sánchez-Santos; M W Trotter; J De Las Rivas
Journal:  Bioinformatics       Date:  2019-10-01       Impact factor: 6.937

5.  Evaluation of chemotherapy response with serum squamous cell carcinoma antigen level in cervical cancer patients: a prospective cohort study.

Authors:  Mingzhu Yin; Yan Hou; Tao Zhang; Changyi Cui; Xiaohua Zhou; Fengyu Sun; Huiyan Li; Xia Li; Jian Zheng; Xiuwei Chen; Cong Li; Xiaoming Ning; Kang Li; Ge Lou
Journal:  PLoS One       Date:  2013-01-22       Impact factor: 3.240

Review 6.  Systems virology: host-directed approaches to viral pathogenesis and drug targeting.

Authors:  G Lynn Law; Marcus J Korth; Arndt G Benecke; Michael G Katze
Journal:  Nat Rev Microbiol       Date:  2013-06-03       Impact factor: 60.633

7.  Identification of a human neonatal immune-metabolic network associated with bacterial infection.

Authors:  Claire L Smith; Paul Dickinson; Thorsten Forster; Marie Craigon; Alan Ross; Mizanur R Khondoker; Rebecca France; Alasdair Ivens; David J Lynn; Judith Orme; Allan Jackson; Paul Lacaze; Katie L Flanagan; Benjamin J Stenson; Peter Ghazal
Journal:  Nat Commun       Date:  2014-08-14       Impact factor: 14.919

8.  A comparison of machine learning methods for classification using simulation with multiple real data examples from mental health studies.

Authors:  Mizanur Khondoker; Richard Dobson; Caroline Skirrow; Andrew Simmons; Daniel Stahl
Journal:  Stat Methods Med Res       Date:  2013-09-18       Impact factor: 3.021

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.