Suyan Tian1, Mayte Suárez-Fariñas. 1. Division of Clinical Epidemiology, First Hospital of the Jilin University, Changchun, Jilin, China ; Center for Clinical and Translational Science, The Rockefeller University, New York, New York, United States of America.
Abstract
BACKGROUND: As microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set. RESULTS: Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted. CONCLUSIONS: Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.
BACKGROUND: As microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set. RESULTS: Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted. CONCLUSIONS: Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.
Authors: Suyan Tian; James G Krueger; Katherine Li; Ali Jabbari; Carrie Brodmerkel; Michelle A Lowes; Mayte Suárez-Fariñas Journal: PLoS One Date: 2012-09-05 Impact factor: 3.240
Authors: Yihong Yao; Laura Richman; Chris Morehouse; Melissa de los Reyes; Brandon W Higgs; Anmarie Boutrin; Barbara White; Anthony Coyle; James Krueger; Peter A Kiener; Bahija Jallal Journal: PLoS One Date: 2008-07-16 Impact factor: 3.240
Authors: David A Ewald; Dana Malajian; James G Krueger; Christopher T Workman; Tianjiao Wang; Suyan Tian; Thomas Litman; Emma Guttman-Yassky; Mayte Suárez-Fariñas Journal: BMC Med Genomics Date: 2015-10-12 Impact factor: 3.063