| Literature DB >> 31480430 |
Lei Chen1,2,3, Tao Zeng4, Xiaoyong Pan5,6, Yu-Hang Zhang7, Tao Huang8, Yu-Dong Cai9.
Abstract
Breast cancer is regarded worldwide as a severe human disease. Various genetic variations, including hereditary and somatic mutations, contribute to the initiation and progression of this disease. The diagnostic parameters of breast cancer are not limited to the conventional protein content and can include newly discovered genetic variants and even genetic modification patterns such as methylation and microRNA. In addition, breast cancer detection extends to detailed breast cancer stratifications to provide subtype-specific indications for further personalized treatment. One genome-wide expression-methylation quantitative trait loci analysis confirmed that different breast cancer subtypes have various methylation patterns. However, recognizing clinically applied (methylation) biomarkers is difficult due to the large number of differentially methylated genes. In this study, we attempted to re-screen a small group of functional biomarkers for the identification and distinction of different breast cancer subtypes with advanced machine learning methods. The findings may contribute to biomarker identification for different breast cancer subtypes and provide a new perspective for differential pathogenesis in breast cancer subtypes.Entities:
Keywords: breast cancer; methylation; multi-class classification; pattern; subtype
Mesh:
Year: 2019 PMID: 31480430 PMCID: PMC6747348 DOI: 10.3390/ijms20174269
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Performance and optimum number of features of incremental feature selection (IFS) with support vector machine (SVM) and random forest (RF).
| Terms | Sensitivity/Specificity | SVM | RF |
|---|---|---|---|
| Number of optimum features | / | 1890 | 840 |
| Matthews correlation coefficient (MCC) | / | 0.925 | 0.860 |
| Overall accuracy | / | 0.949 | 0.906 |
| Basal | Sensitivity | 1.000 | 1.000 |
| Specificity | 1.000 | 0.991 | |
| Her2 | Sensitivity | 0.973 | 0.892 |
| Specificity | 0.986 | 0.982 | |
| LumA | Sensitivity | 0.942 | 0.917 |
| Specificity | 0.963 | 0.918 | |
| LumB | Sensitivity | 0.921 | 0.841 |
| Specificity | 0.974 | 0.963 |
Figure 1The confusion matrix yielded by the best support vector machine (SVM) and random forest (RF) classifiers. (A) The confusion matrix of the best SVM classifier; (B) The confusion matrix of the best RF classifier.
Figure 2The performance of support vector machine (SVM) and random forest (RF) change with the number of features. (A) The performance of SVM; (B) The performance of RF.
Figure 3The performance of support vector machine (SVM) and random forest (RF) change with the number of features on the dataset without LumB samples. (A) The performance of SVM; (B) The performance of RF.
Performance and optimum number of features of IFS with SVM and RF on the dataset without LumB samples.
| Terms | Sensitivity/Specificity | SVM | RF |
|---|---|---|---|
| Number of optimum features | / | 480 | 40 |
| MCC | / | 0.961 | 0.951 |
| Overall accuracy | / | 0.979 | 0.974 |
| Basal | Sensitivity | 1.000 | 0.971 |
| Specificity | 1.000 | 0.994 | |
| Her2 | Sensitivity | 0.973 | 0.946 |
| Specificity | 0.981 | 0.987 | |
| LumA | Sensitivity | 0.975 | 0.983 |
| Specificity | 0.986 | 0.972 |
Figure 4The confusion matrix yielded by the SVM with top 40 features.
Figure 5Boxplots to illustrate the distributions of top ten features on four breast cancer subtypes.
Figure 6Flowchart for classifying samples from four breast cancer subtypes.