| Literature DB >> 32326887 |
Xingheng Yu1, Xinqi Gong2, Hao Jiang3.
Abstract
BACKGROUND: Breast cancer is one of the common kinds of cancer among women, and it ranks second among all cancers in terms of incidence, after lung cancer. Therefore, it is of great necessity to study the detection methods of breast cancer. Recent research has focused on using gene expression data to predict outcomes, and kernel methods have received a lot of attention regarding the cancer outcome evaluation. However, selecting the appropriate kernels and their parameters still needs further investigation.Entities:
Keywords: Breast Cancer; HMKL; Hadamard kernel; MKL; PSO
Mesh:
Year: 2020 PMID: 32326887 PMCID: PMC7181520 DOI: 10.1186/s12859-020-3483-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The general schema of HMKL. The HMKL framework consists of two parts. The first part is to select the optimal kernel function parameters by PSO and the second part is an HMKL framework composed of three heterogeneous kernels (Hadamard, RBF and linear kernels)
Fig. 2The search concept of the particle swarm optimization. The figure shows how we employ PSO to draw the actual particle selection process of the GSE32394 dataset. There are three particles in each group, and the optimum particle in each group is found in each cycle (Particle Best Solution) and in all the previous cycles of the optimal particle (Global Best Solution)
Information about the gene expression datasets
| name | Number of genes | Number of samples | Number of classes |
|---|---|---|---|
| GSE32394 | 1259 | 19 | 2 |
| GSE59993 | 1205 | 78 | 2 |
| GSE1872 | 15,923 | 35 | 2 |
| GSE76260 | 1145 | 64 | 2 |
| GSE59246 | 62,976 | 102 | 2 |
| BRCA1 | 17,204 | 107 | 2 |
| BRCA2 | 17,190 | 138 | 2 |
| BRCA3 | 17,193 | 223 | 2 |
Fig. 3The HMKL flow chart
Averaged AUC values for determining the optimal σ in the RBF kernel
| Datasets | σ = 0.01 | σ = 0.1 | σ = 1 | σ = 10 | σ = 100 | σ = 1000 |
|---|---|---|---|---|---|---|
| GSE32394 | 0.1589 ± 0.1189 | 0.1511 ± 0.1511 | 0.1956 ± 0.1400 | 0.6667 ± 0.1667 | 0.9344 ± 0.0456 | |
| GSE59993 | 0.3455 ± 0.0637 | 0.3606 ± 0.1239 | 0.4286 ± 0.0433 | 0.6891 ± 0.0412 | 0.6988 ± 0.0413 | |
| GSE1872 | 0.2697 ± 0.0917 | 0.2042 ± 0.0686 | 0.2068 ± 0.0659 | 0.2432 ± 0.1053 | 0.2424 ± 0.1061 | |
| GSE76260 | 0.3823 ± 0.0796 | 0.4224 ± 0.0464 | 0.3837 ± 0.0937 | 0.8270 ± 0.0168 | 0.8337 ± 0.0485 | |
| GSE59246 | 0.4550 ± 0.0543 | 0.4442 ± 0.0785 | 0.7543 ± 0.0462 | 0.7539 ± 0.0334 | 0.7553 ± 0.0111 | |
| BRCA1 | 0.2565 ± 0.0776 | 0.2336 ± 0.1205 | 0.4720 ± 0.1095 | 0.9659 ± 0.0303 | 0.9407 ± 0.0951 | |
| BRCA2 | 0.2316 ± 0.0497 | 0.2377 ± 0.1074 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | ||
| BRCA3 | 0.3410 ± 0.0424 | 0.3351 ± 0.0335 | 0.7377 ± 0.1495 |
Averaged AUC values for determining the optimal β of Hadamard SVM
| Datasets | β = −1 | β = −0.1 | β = − 0.01 | β = 0.01 | β = 0.1 | β = 1 |
|---|---|---|---|---|---|---|
| GSE32394 | 0.9500 ± 0.0278 | 0.9544 ± 0.0433 | 0.9444 ± 0.0444 | 0.9356 ± 0.0356 | 0.9611 ± 0.0389 | |
| GSE59993 | 0.8063 ± 0.0467 | 0.6904 ± 0.0809 | 0.7055 ± 0.0555 | 0.7137 ± 0.0510 | 0.7113 ± 0.0294 | |
| GSE1872 | ||||||
| GSE76260 | 0.8550 ± 0.0220 | 0.8346 ± 0.0533 | 0.8313 ± 0.0389 | 0.8226 ± 0.0706 | 0.7673 ± 0.0310 | |
| GSE59246 | 0.8994 ± 0.0143 | 0.8666 ± 0.0168 | 0.8564 ± 0.0179 | 0.8888 ± 0.0227 | 0.8969 ± 0.0250 | |
| BRCA1 | 0.9726 ± 0.0134 | 0.9758 ± 0.0089 | 0.9949 ± 0.0051 | 0.9750 ± 0.0174 | 0.9782 ± 0.0161 | |
| BRCA2 | ||||||
| BRCA3 |
Averaged AUC values of linear SVM
| Datasets | |
|---|---|
| GSE32394 | 0.9644 ± 0.0422 |
| GSE59993 | 0.8371 ± 0.0331 |
| GSE1872 | 0.3977 ± 0.2008 |
| GSE76260 | 0.7857 ± 0.0629 |
| GSE59246 | 0.8896 ± 0.0375 |
| BRCA1 | 0.9598 ± 0.0317 |
| BRCA2 | 1.0000 ± 0.0000 |
| BRCA3 | 0.9997 ± 0.0026 |
Averaged AUC values of random forest
| Datasets | |
|---|---|
| GSE32394 | 0.9644 ± 0.0422 |
| GSE59993 | 0.8371 ± 0.0331 |
| GSE1872 | 0.3977 ± 0.2008 |
| GSE76260 | 0.7857 ± 0.0629 |
| GSE59246 | 0.8896 ± 0.0375 |
| BRCA1 | 0.9598 ± 0.0317 |
| BRCA2 | 1.0000 ± 0.0000 |
| BRCA3 | 0.9997 ± 0.0026 |
Averaged AUC values of decision tree
| Datasets | |
|---|---|
| GSE32394 | 0.7589 ± 0.2256 |
| GSE59993 | 0.8099 ± 0.0740 |
| GSE1872 | 1.0000 ± 0.0000 |
| GSE76260 | 0.8313 ± 0.0813 |
| GSE59246 | 0.8372 ± 0.0497 |
| BRCA1 | 0.9925 ± 0.0115 |
| BRCA2 | 0.9997 ± 0.0026 |
| BRCA3 | 1.0000 ± 0.0000 |
Averaged AUC values of GA with Rotation Forest
| Datasets | |
|---|---|
| GSE32394 | 0.7589 ± 0.2256 |
| GSE59993 | 0.8099 ± 0.0740 |
| GSE1872 | 1.0000 ± 0.0000 |
| GSE76260 | 0.8313 ± 0.0813 |
| GSE59246 | 0.8372 ± 0.0497 |
| BRCA1 | 0.9925 ± 0.0115 |
| BRCA2 | 0.9997 ± 0.0026 |
| BRCA3 | 1.0000 ± 0.0000 |
Averaged AUC values of BFA + RF
| Datasets | |
|---|---|
| GSE32394 | 0.8000 ± 0.2449 |
| GSE59993 | 0.8474 ± 0.1381 |
| GSE1872 | 1.0000 ± 0.0000 |
| GSE76260 | 0.8167 ± 0.1856 |
| GSE59246 | 0.7646 ± 0.1304 |
| BRCA1 | 0.9909 ± 0.2727 |
| BRCA2 | 1.0000 ± 0.0000 |
| BRCA3 | 1.0000 ± 0.0000 |
Averaged AUC values for different methods
| Classifier | Decision Tree | Random Forest | GA with Rotation Forest | BFA + RF | SVM | SVM |
|---|---|---|---|---|---|---|
| Kernel | Linear kernel | RBF kernel | ||||
| GSE32394 | 0.7589 ± 0.2256 | 0.8000 ± 0.2449 | 0.7000 ± 0.3317 | 0.8000 ± 0.2449 | 0.9644 ± 0.0422 | 0.9344 ± 0.0456 |
| GSE59993 | 0.8099 ± 0.0740 | 0.7484 ± 0.1438 | 0.8663 ± 0.0983 | 0.8474 ± 0.1381 | 0.8371 ± 0.0331 | 0.8287 ± 0.0247 |
| GSE1872 | 1.0000 ± 0.0000 | 0.9951 ± 0.0178 | 0.9667 ± 0.1000 | 1.0000 ± 0.0000 | 0.3977 ± 0.2008 | 0.2042 ± 0.0686 |
| GSE76260 | 0.8313 ± 0.0813 | 0.7889 ± 0.0441 | 0.8583 ± 0.0500 | 0.8167 ± 0.1856 | 0.7857 ± 0.0629 | 0.8357 ± 0.0213 |
| GSE59246 | 0.6455 ± 0.0795 | 0.8486 ± 0.0349 | 0.8474 ± 0.1026 | 0.7646 ± 0.1304 | 0.8896 ± 0.0375 | 0.7629 ± 0.0094 |
| BRCA1 | 0.9925 ± 0.0115 | 0.9727 ± 0.4166 | 0.9818 ± 0.3636 | 0.9909 ± 0.2727 | 0.9598 ± 0.0317 | 0.9918 ± 0.0060 |
| BRCA2 | 0.9997 ± 0.0026 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 |
| BRCA3 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 0.9997 ± 0.0026 | 1.0000 ± 0.0000 |
| Classifier | SVM | MKL(d = 3) | MKL(d = 3) | MKL(d = 3) | MKL(d = 21) | HMKL |
| Kernel | Hadamard kernel | RBF kernel | Hadamard kernel | Mixed kernels | Mixed kernels | |
| GSE32394 | 0.9778 ± 0.0222 | 0.9422 ± 0.0422 | 0.9844 ± 0.0511 | 0.9867 ± 0.6333 | 0.9899 ± 0.0333 | |
| GSE59993 | 0.8661 ± 0.0510 | 0.7073 ± 0.0532 | 0.8973 ± 0.0445 | 0.8990 ± 0.0336 | 0.9018 ± 0.0175 | |
| GSE1872 | 1.0000 ± 0.0000 | 0.2667 ± 0.0894 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | |
| GSE76260 | 0.8595 ± 0.0126 | 0.8302 ± 0.0419 | 0.8467 ± 0.0313 | 0.8604 ± 0.0416 | 0.8633 ± 0.0313 | |
| GSE59246 | 0.8996 ± 0.0250 | 0.8939 ± 0.0317 | 0.8991 ± 0.0179 | 0.9006 ± 0.0292 | 0.9008 ± 0.0282 | |
| BRCA1 | 0.9953 ± 0.0047 | 0.9921 ± 0.0061 | 0.9953 ± 0.0045 | 0.9957 ± 0.0032 | 0.9960 ± 0.0026 | |
| BRCA2 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | |
| BRCA3 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 | 1.0000 ± 0.0000 |