Literature DB >> 20172367

Learning curves in classification with microarray data.

Kenneth R Hess1, Caimiao Wei.   

Abstract

The performance of many repeated tasks improves with experience and practice. This improvement tends to be rapid initially and then decreases. The term "learning curve" is often used to describe the phenomenon. In supervised machine learning, the performance of classification algorithms often increases with the number of observations used to train the algorithm. We use progressively larger samples of observations to train the algorithm and then plot performance against the number of training observations. This yields the familiar negatively accelerating learning curve. To quantify the learning curve, we fit inverse power law models to the progressively sampled data. We fit such learning curves to four large clinical cancer genomic datasets, using three classifiers (diagonal linear discriminant analysis, K-nearest-neighbor with three neighbors, and support vector machines) and four values for the number of top genes included (5, 50, 500, 5,000). The inverse power law models fit the progressively sampled data reasonably well and showed considerable diversity when multiple classifiers are applied to the same data. Some classifiers showed rapid and continued increase in performance as the number of training samples increased, while others showed little if any improvement. Assessing classifier efficiency is particularly important in genomic studies since samples are so expensive to obtain. It is important to employ an algorithm that uses the predictive information efficiently, but with a modest number of training samples (>50), learning curves can be used to assess the predictive efficiency of classification algorithms. Copyright 2010 Elsevier Inc. All rights reserved.

Entities:  

Mesh:

Year:  2010        PMID: 20172367      PMCID: PMC4482113          DOI: 10.1053/j.seminoncol.2009.12.002

Source DB:  PubMed          Journal:  Semin Oncol        ISSN: 0093-7754            Impact factor:   4.929


  2 in total

Review 1.  Statistical assessment of the learning curves of health technologies.

Authors:  C R Ramsay; A M Grant; S A Wallace; P H Garthwaite; A F Monk; I T Russell
Journal:  Health Technol Assess       Date:  2001       Impact factor: 4.014

2.  Estimating dataset size requirements for classifying DNA microarray data.

Authors:  Sayan Mukherjee; Pablo Tamayo; Simon Rogers; Ryan Rifkin; Anna Engle; Colin Campbell; Todd R Golub; Jill P Mesirov
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

  2 in total
  3 in total

1.  Addressing the challenge of defining valid proteomic biomarkers and classifiers.

Authors:  Mohammed Dakna; Keith Harris; Alexandros Kalousis; Sebastien Carpentier; Walter Kolch; Joost P Schanstra; Marion Haubitz; Antonia Vlahou; Harald Mischak; Mark Girolami
Journal:  BMC Bioinformatics       Date:  2010-12-10       Impact factor: 3.169

2.  Predicting sample size required for classification performance.

Authors:  Rosa L Figueroa; Qing Zeng-Treitler; Sasikiran Kandula; Long H Ngo
Journal:  BMC Med Inform Decis Mak       Date:  2012-02-15       Impact factor: 2.796

3.  Radio-pathomic Maps of Epithelium and Lumen Density Predict the Location of High-Grade Prostate Cancer.

Authors:  Sean D McGarry; Sarah L Hurrell; Kenneth A Iczkowski; William Hall; Amy L Kaczmarowski; Anjishnu Banerjee; Tucker Keuter; Kenneth Jacobsohn; John D Bukowy; Marja T Nevalainen; Mark D Hohenwalter; William A See; Peter S LaViolette
Journal:  Int J Radiat Oncol Biol Phys       Date:  2018-04-24       Impact factor: 8.013

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.