Literature DB >> 19079753

Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research.

Leif E Peterson1, Matthew A Coleman.   

Abstract

Receiver operating characteristic (ROC) curves were generated to obtain classification area under the curve (AUC) as a function of feature standardization, fuzzification, and sample size from nine large sets of cancer-related DNA microarrays. Classifiers used included k nearest neighbor (kNN), näive Bayes classifier (NBC), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), learning vector quantization (LVQ1), logistic regression (LOG), polytomous logistic regression (PLOG), artificial neural networks (ANN), particle swarm optimization (PSO), constricted particle swarm optimization (CPSO), kernel regression (RBF), radial basis function networks (RBFN), gradient descent support vector machines (SVMGD), and least squares support vector machines (SVMLS). For each data set, AUC was determined for a number of combinations of sample size, total sum[-log(p)] of feature t-tests, with and without feature standardization and with (fuzzy) and without (crisp) fuzzification of features. Altogether, a total of 2,123,530 classification runs were made. At the greatest level of sample size, ANN resulted in a fitted AUC of 90%, while PSO resulted in the lowest fitted AUC of 72.1%. AUC values derived from 4NN were the most dependent on sample size, while PSO was the least. ANN depended the most on total statistical significance of features used based on sum[-log(p)], whereas PSO was the least dependent. Standardization of features increased AUC by 8.1% for PSO and -0.2% for QDA, while fuzzification increased AUC by 9.4% for PSO and reduced AUC by 3.8% for QDA. AUC determination in planned microarray experiments without standardization and fuzzification of features will benefit the most if CPSO is used for lower levels of feature significance (i.e., sum[-log(p)] ~ 50) and ANN is used for greater levels of significance (i.e., sum[-log(p)] ~ 500). When only standardization of features is performed, studies are likely to benefit most by using CPSO for low levels of feature statistical significance and LVQ1 for greater levels of significance. Studies involving only fuzzification of features should employ LVQ1 because of the substantial gain in AUC observed and low expense of LVQ1. Lastly, PSO resulted in significantly greater levels of AUC (89.5% average) when feature standardization and fuzzification were performed. In consideration of the data sets used and factors influencing AUC which were investigated, if low-expense computation is desired then LVQ1 is recommended. However, if computational expense is of less concern, then PSO or CPSO is recommended.

Entities:  

Year:  2008        PMID: 19079753      PMCID: PMC2600874          DOI: 10.1016/j.ijar.2007.03.006

Source DB:  PubMed          Journal:  Int J Approx Reason        ISSN: 0888-613X            Impact factor:   3.816


  21 in total

1.  Estimating dataset size requirements for classifying DNA microarray data.

Authors:  Sayan Mukherjee; Pablo Tamayo; Simon Rogers; Ryan Rifkin; Anna Engle; Colin Campbell; Todd R Golub; Jill P Mesirov
Journal:  J Comput Biol       Date:  2003       Impact factor: 1.479

2.  Determination of minimum sample size and discriminatory expression patterns in microarray data.

Authors:  Daehee Hwang; William A Schmitt; George Stephanopoulos; Gregory Stephanopoulos
Journal:  Bioinformatics       Date:  2002-09       Impact factor: 6.937

3.  Sample size for identifying differentially expressed genes in microarray experiments.

Authors:  Sue-Jane Wang; James J Chen
Journal:  J Comput Biol       Date:  2004       Impact factor: 1.479

4.  FDR-controlling testing procedures and sample size determination for microarrays.

Authors:  Shuying S Li; Jeannette Bigler; Johanna W Lampe; John D Potter; Ziding Feng
Journal:  Stat Med       Date:  2005-08-15       Impact factor: 2.373

5.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

6.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.

Authors:  U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine
Journal:  Proc Natl Acad Sci U S A       Date:  1999-06-08       Impact factor: 11.205

7.  Gene expression correlates of clinical prostate cancer behavior.

Authors:  Dinesh Singh; Phillip G Febbo; Kenneth Ross; Donald G Jackson; Judith Manola; Christine Ladd; Pablo Tamayo; Andrew A Renshaw; Anthony V D'Amico; Jerome P Richie; Eric S Lander; Massimo Loda; Philip W Kantoff; Todd R Golub; William R Sellers
Journal:  Cancer Cell       Date:  2002-03       Impact factor: 31.743

8.  A simple method for assessing sample sizes in microarray experiments.

Authors:  Robert Tibshirani
Journal:  BMC Bioinformatics       Date:  2006-03-02       Impact factor: 3.169

9.  Sample size for detecting differentially expressed genes in microarray experiments.

Authors:  Caimiao Wei; Jiangning Li; Roger E Bumgarner
Journal:  BMC Genomics       Date:  2004-11-08       Impact factor: 3.969

10.  Optimized Particle Swarm Optimization (OPSO) and its application to artificial neural network training.

Authors:  Michael Meissner; Michael Schmuker; Gisbert Schneider
Journal:  BMC Bioinformatics       Date:  2006-03-10       Impact factor: 3.169

View more
  5 in total

1.  CD105 protein depletion enhances human adipose-derived stromal cell osteogenesis through reduction of transforming growth factor β1 (TGF-β1) signaling.

Authors:  Benjamin Levi; Derrick C Wan; Jason P Glotzbach; Jeong Hyun; Michael Januszyk; Daniel Montoro; Michael Sorkin; Aaron W James; Emily R Nelson; Shuli Li; Natalina Quarto; Min Lee; Geoffrey C Gurtner; Michael T Longaker
Journal:  J Biol Chem       Date:  2011-09-23       Impact factor: 5.157

2.  QCT of the proximal femur--which parameters should be measured to discriminate hip fracture?

Authors:  O Museyko; V Bousson; J Adams; J -D Laredo; K Engelke
Journal:  Osteoporos Int       Date:  2015-09-28       Impact factor: 4.507

3.  Identification of host-microbe interaction factors in the genomes of soft rot-associated pathogens Dickeya dadantii 3937 and Pectobacterium carotovorum WPP14 with supervised machine learning.

Authors:  Bing Ma; Amy O Charkowski; Jeremy D Glasner; Nicole T Perna
Journal:  BMC Genomics       Date:  2014-06-21       Impact factor: 3.969

4.  Automated Detection of Cancer Associated Genes Using a Combined Fuzzy-Rough-Set-Based F-Information and Water Swirl Algorithm of Human Gene Expression Data.

Authors:  Pugalendhi Ganesh Kumar; Muthu Subash Kavitha; Byeong-Cheol Ahn
Journal:  PLoS One       Date:  2016-12-09       Impact factor: 3.240

5.  3-Dimensional facial expression recognition in human using multi-points warping.

Authors:  Olalekan Agbolade; Azree Nazri; Razali Yaakob; Abdul Azim Ghani; Yoke Kqueen Cheah
Journal:  BMC Bioinformatics       Date:  2019-12-02       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.