Literature DB >> 19187584

Identification of differential gene expression for microarray data using recursive random forest.

Xiao-yan Wu1, Zhen-yu Wu, Kang Li.   

Abstract

BACKGROUND: The major difficulty in the research of DNA microarray data is the large number of genes compared with the relatively small number of samples as well as the complex data structure. Random forest has received much attention recently; its primary characteristic is that it can form a classification model from the data with high dimensionality. However, optimal results can not be obtained for gene selection since it is still affected by undifferentiated genes. We proposed recursive random forest analysis and applied it to gene selection.
METHODS: Recursive random forest, which is an improvement of random forest, obtains optimal differentiated genes after step by step dropping of genes which, according to a certain algorithm, have no effects on classification. The method has the advantage of random forest and provides a gene importance scale as well. The value of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve, which synthesizes the information of sensitivity and specificity, is adopted as the key standard for evaluating the performance of this method. The focus of the paper is to validate the effectiveness of gene selection using recursive random forest through the analysis of five microarray datasets; colon, prostate, leukemia, breast and skin data.
RESULTS: Five microarray datasets were analyzed and better classification results have been attained using only a few genes after gene selection. The biological information of the selected genes from breast and skin data was confirmed according to the National Center for Biotechnology Information (NCBI). The results prove that the genes associated with diseases can be effectively retained by recursive random forest.
CONCLUSIONS: Recursive random forest can be effectively applied to microarray data analysis and gene selection. The retained genes in the optimal model provide important information for clinical diagnoses and research of the biological mechanism of diseases.

Entities:  

Mesh:

Year:  2008        PMID: 19187584

Source DB:  PubMed          Journal:  Chin Med J (Engl)        ISSN: 0366-6999            Impact factor:   2.628


  3 in total

1.  Machine learning-based classification and diagnosis of clinical cardiomyopathies.

Authors:  Ahmad Alimadadi; Ishan Manandhar; Sachin Aryal; Patricia B Munroe; Bina Joe; Xi Cheng
Journal:  Physiol Genomics       Date:  2020-08-03       Impact factor: 3.107

2.  Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy.

Authors:  Zheng Rong Yang
Journal:  BMC Bioinformatics       Date:  2009-10-29       Impact factor: 3.169

3.  Random forest in clinical metabolomics for phenotypic discrimination and biomarker selection.

Authors:  Tianlu Chen; Yu Cao; Yinan Zhang; Jiajian Liu; Yuqian Bao; Congrong Wang; Weiping Jia; Aihua Zhao
Journal:  Evid Based Complement Alternat Med       Date:  2013-02-02       Impact factor: 2.629

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.