| Literature DB >> 29957558 |
Juntao Li1, Yanyan Wang2, Xuekun Song3, Huimin Xiao4.
Abstract
Multi-class classification has attracted much attention in cancer diagnosis and treatment and many machine learning methods have emerged for addressing this issue recently. However, class imbalance and gene selection problems occur in classifying lung cancer data. In this paper, an adaptive multinomial regression with a sparse overlapping group lasso penalty is proposed to perform classification and grouped gene selection for lung cancer gene expression data. An overlapped grouping strategy with biological interpretability is proposed, which highlights the importance of gene groups from the minority classes. By using the conditional mutual information, the gene significance within each group is evaluated and the data-driven weights are constructed. Based on the grouping strategy and constructed weights, a regularized adaptive multinomial regression is presented and the solving algorithm is developed, which can not only select the important gene groups for each class in performing multi-class classification, but also adaptively select important genes within each group. The experiment results show that the proposed method significantly outperforms the other 6 methods on classification accuracy, and the selected genes are disease-causing genes for lung cancer.Entities:
Keywords: Imbalanced data; Multi-class classification; Overlapping group lasso; Weighted gene co-expression networks
Mesh:
Year: 2018 PMID: 29957558 DOI: 10.1016/j.compbiomed.2018.06.014
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589