| Literature DB >> 33962648 |
Leila Mirsadeghi1, Reza Haji Hosseini2, Ali Mohammad Banaei-Moghaddam3, Kaveh Kavousi4.
Abstract
BACKGROUND: Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.Entities:
Keywords: Ensemble classifier; Metastasis breast tumor; Mutation data; Plausible drivers; Targeted clinical panel sequencing
Mesh:
Year: 2021 PMID: 33962648 PMCID: PMC8105935 DOI: 10.1186/s12920-021-00974-3
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
22 ensemble learning methods concerned with the detection of breast cancer
| Method name | Publication year | |
|---|---|---|
| 1 | Bayesian networks-based model integration [ | 2006 and 2019 |
| 2 | RSS-SCS method [ | 2016 |
| 3 | Collective approach (correlation, color palette, color proportion, and SVM) [ | 2016 |
| 4 | Kernel-based Data Fusion Method for Gene Prioritization [ | 2015 |
| 5 | DECORATE methoda [ | 2015 |
| 6 | HyDRA methoda [ | 2015 |
| 7 | GenEnsemble methoda (NBS-IB3-SVM-C4.5 DT) [ | 2014 |
| 8 | NB (Naïve Bayes) combiner method [ | 2014 |
| 9 | Evolutionary Ensemble Model [ | 2014 |
| 10 | smoothed t-statistic SVM (stSVM) [ | 2013 |
| 11 | SVM Classifiers Fusion (three SVM) [ | 2013 |
| 12 | COMBINER (Core Module Biomarker Identification)a [ | 2012 |
| 13 | Ensembles of BioHEL Rule Set [ | 2012 |
| 14 | Stacking IB3-NBS-RF-SVM method [ | 2012 |
| 15 | REIS-based ensemble method [ | 2011 |
| 16 | MRS method [ | 2010 |
| 17 | Boosting-TWSVM method [ | 2009 |
| 18 | Bagging and boosting-based TWSVM [ | 2009 |
| 19 | Feature Subsets Method [ | 2008 |
| 20 | BNCE method [ | 2007 |
| 21 | Bayesian Network Classifier [ | 2006 |
| 22 | enSVM (200 SVM) [ | 2006 |
aSome methods that are proposed to discover genomic markers related to breast cancer
Fig. 1The proposed fusion system workflow for prediction of driver genes in cancers
Fig. 2The Workflow for software tools and machine learning methods. a Feature extraction and Feature vector construction, b feature integration, c decision integration
Fig. 3The workflows for the selection of training data and unseen data for BRCA and MBCA. a Positive and negative training genes, b genome-wide screening
Fig. 4Outputs for BRCA and MBCA. a The existence of diversity among features extracted from software tools after setting p value ≤ 0.05, b frequency of predicted driver and passenger genes using four learning methods, c the comparison of driver genes predicted by four methods, d the comparison of F1 scores as an evaluation metric of methods
The 16 common genes predicted by all machines in the top 100
| Symbol | NSCGMCH | NSCGMBH | PKGECC | PKGEBC |
|---|---|---|---|---|
| OXCT1a | #N/A | #N/A | #N/A | #N/A |
| KDRb | 7 | 2 | ✓ | #N/A |
| APEX1a | #N/A | #N/A | #N/A | #N/A |
| GCM2a | #N/A | #N/A | #N/A | #N/A |
| UNC13Da | #N/A | #N/A | #N/A | #N/A |
| NCOR1 | #N/A | #N/A | ✓ | ✓ |
| KRAS | 20 | #N/A | ✓ | ✓ |
| THAP3a | #N/A | #N/A | #N/A | #N/A |
| SERPINE2 | 1 | #N/A | #N/A | #N/A |
| BATFa | #N/A | #N/A | #N/A | #N/A |
| C8orf44a | #N/A | #N/A | #N/A | #N/A |
| C12orf29a | #N/A | #N/A | #N/A | #N/A |
| ZNF546a | #N/A | #N/A | #N/A | #N/A |
| KDM6B | 1 | #N/A | #N/A | #N/A |
| GCNT4a | #N/A | #N/A | #N/A | #N/A |
| FOXA1 | #N/A | #N/A | ✓ | ✓ |
The confirmed genes as the known genes related to different primary cancers or primary breast tumors in OMIM, CGC, and NCG databases have been marked in the last two columns
NSCGMCH, number of studies that have cited genes related to different metastatic cancers in the HCMDB; NSCGMBH, number of studies that have cited these genes related to metastatic breast cancer in the HCMDB; PKGECC, predicted known genes by EC associated with different cancers that are confirmed in OMIM, CGC, and NCG; PKGEBC, predicted known genes by EC associated with Breast cancer that are confirmed in OMIM, CGC, and NCG
aTen new genes that have not already been introduced in the databases
bKDR is confirmed in HCMDB related to metastatic breast cancer in two studies
The enrichment rate of driver genes predicted by EARN. (a) MBCA, (b) BRCA
| All different cancers | Metastatic breast cancer | ||||
|---|---|---|---|---|---|
| HCMDB | HCMDB | ||||
| PGEMCH | RGMHP | PGEMCH | PGEMBCH (#) | RGMBHP | PGEMBCH |
| (a) MBCA | |||||
| 292 | 2203 | 13.25 | 73a | 585 | 12.48 |
PGEMCH, predicted genes by EC associated with different metastatic cancers that are confirmed in HCMDB, RGMHP, remained genes related to different metastatic cancers in the HCMDB after excluding positive training set, PGEMBCH, predicted genes by EC associated with metastatic breast cancer that are confirmed in HCMDB, RGMBHP, remained genes related to metastatic breast cancer in the HCMDB after excluding positive training set, PKGECC, predicted known genes by EC associated with different cancers that are confirmed in OMIM, CGC, and NCG, RKGCPP, remained known genes related to different cancers in the public databases after excluding positive training set, PKGEBC, predicted known genes by EC associated with breast cancer that are confirmed in OMIM, CGC, and NCG, RKGBPP, remained known genes related to breast cancer in the public databases after excluding positive training set
aThese 73 genes have been also cited in 108 studies of HCMDB [see Additional file 9: S42]
12 driver genes predicted by EARN50 which are confirmed for metastatic cancers in the HCMDB
| Symbol | Prediction score | Rank | PSMM (%) [ | PSMM (%) [ | NSCGMCH | NSCGMBH | MCMGM | PKGECC | PKGEBC |
|---|---|---|---|---|---|---|---|---|---|
| APEX1 | 0.900511991 | 5 | 0.50 | 1.70 | 1 | #N/A | 5 | #N/A | #N/A |
| ARID1A | 0.895213526 | 11 | 2.40 | 5.10 | 2 | #N/A | 24 | ✓ | ✓ |
| KDM6B | 0.894029187 | 13 | 1.40 | 4.60 | 1 | #N/A | 16 | #N/A | #N/A |
| TBX3 | 0.893837209 | 14 | 2.80 | 5.10 | 1 | #N/A | 21 | ✓ | ✓ |
| KDRa | 0.890079401 | 17 | 0.90 | 1.70 | 7 | 2 | 9 | ✓ | #N/A |
| SERPINE2 | 0.889205475 | 19 | 0.90 | 0.80 | 1 | #N/A | 4 | #N/A | #N/A |
| TBL1XR1 | 0.871240171 | 27 | 0.90 | 0.80 | 2 | #N/A | 4 | ✓ | ✓ |
| KRAS | 0.868267682 | 30 | 1.40 | 1.70 | 20 | #N/A | 7 | ✓ | ✓ |
| NOS3 | 0.861560093 | 31 | 2.40 | 2.10 | 1 | #N/A | 12 | #N/A | #N/A |
| RAPGEF3 | 0.851947423 | 42 | #N/A | 2.50 | 2 | #N/A | 6 | #N/A | #N/A |
| SELEa | 0.847865292 | 49 | 0.90 | 1.30 | 12 | 1 | 5 | #N/A | #N/A |
| MMEa | 0.847698297 | 50 | 0.90 | 2.50 | 9 | 1 | 9 | #N/A | #N/A |
Also, the rank number, score, and mutation count for these genes are provided in the table. The confirmed genes as the known genes related to any primary cancers or primary breast tumors in OMIM, CGC, and NCG databases have been marked in the last two columns
PSMM, Percentage of samples with one or more mutations based on initial mutation file, NSCGMCH, Number of studies that have cited genes related to different metastatic cancers in the HCMDB, NSCGMBH, Number of studies that have cited genes related to metastatic breast cancer in HCMDB, MCMGM, Mutation counts for mutated genes across 450 metastasis tumor samples based on the initial mutation file, PKGECC, Predicted known genes by EC associated with different cancers that are confirmed in OMIM, CGC, and NCG, PKGEBC, Predicted known genes by EC associated with breast cancer that are confirmed in OMIM, CGC, and NCG
aThese genes have been specifically introduced concerning metastatic breast cancer
Validation of four learning methods by some evaluation metrics. (a) MBCA, (b) BRCA
| Method name | F1 score | False Positive Rate | Maximum Precision | Average-Precision | Recall | ROC-AUCa |
|---|---|---|---|---|---|---|
| (a) MBCA | ||||||
| EARN | 0.7961 | 0 | 1 | 0.8266 | 0.6701 | 0.9924 |
| SDb: 0.0264 | SD: 0.0 | SD: 0.0 | SD: 0.0162 | SD: 0.0338 | SD: 0.0008 | |
| RF | 0.756 | 0.0008 | 0.9069 | 0.7873 | 0.6603 | 0.9418 |
| ANN | 0.799 | 0 | 1 | 0.8074 | 0.6733 | 0.968 |
| NLSVM | 0.3972 | 0.0154 | 0.3092 | 0.5852 | 0.5885 | 0.977 |
| (b) BRCA | ||||||
| EARN | 0.9313 | 0 | 1 | 0.9585 | 0.8749 | 0.9979 |
| SD: 0.0117 | SD: 0.0 | SD: 0.0 | SD: 0.0079 | SD: 0.0193 | SD: 0.0005 | |
| RF | 0.8864 | 0.0019 | 0.9061 | 0.9171 | 0.8774 | 0.9719 |
| ANN | 0.8996 | 0 | 1 | 0.9417 | 0.8225 | 0.9873 |
| NLSVM | 0.5441 | 0.0279 | 0.446 | 0.859 | 0.8422 | 0.9926 |
aReceiver Operating Characteristic-Area under Curve
bStandard Deviation
Fig. 5PEA for BRCA and MBCA in top 100 of EARN. a The common enriched pathways and the comparison of frequency of top 100 genes predicted by EARN in these pathways. The pathways [1–14] are listed in the guideline box, b the common/specific enriched main pathways
The common and specific main pathways for BRCA and MBCA
| Number | Pathways | BRCA | MBCA | ||
|---|---|---|---|---|---|
| Number of genes | Name of genes | Number of genes | Name of genes | ||
| The specific main pathways for BRCA | |||||
| 1 | Extracellular matrix organization | 8 | DCN, FN1, ICAM1, ITGA4, ITGAM, ITGAV, ITGB3, ITGB5 | 0 | None |
| 2 | Immune System | 15 | FN1, GAB2, ICAM1, IL1RAPL1, IL1RN, IL2RB, ITGAM, ITGAV, ITGB5, JAK1, MSN, POU2F1, PTPN11, SMARCA4, SYK | 0 | None |
| 3 | Hemostasis | 12 | EGF, FN1, GRB7, ITGA4, ITGAM, ITGAV, ITGB3, PIK3CG, PRKCZ, PTPN11, SERPINA1, SYK | 0 | None |
| 4 | Developmental Biology | 9 | ACVR1B, GAB1, GAB2, GRB7, PTPN11, RELN, SMAD2, SMAD4, VLDLR | 0 | None |
| 5 | Metabolism of RNA | 6 | CPSF1, CPSF3, PCF11, PRPF40A, SF3A1, SF3B1 | 0 | None |
| The specific main pathways for MBCA | |||||
| 6 | Chromatin organization | 0 | None | 7 | TBL1XR1, NCOR1, HDAC3, GPS2, ACTB, KDM6B, PRMT1 |
| 7 | Circadian Clock | 0 | None | 2 | NCOR1, HDAC3 |
| 8 | Organelle biogenesis and maintenance | 0 | None | 4 | TBL1XR1, SIRT4, NCOR1, HDAC3 |
| 9 | Neuronal System | 0 | None | 7 | ABAT, KPNA2, PRKCG, CACNA1E, PLCB1, GRIN1, KRAS |
| 10 | Metabolism | 0 | None | 5 | TBL1XR1, SIN3A, NCOR1, HDAC3, GPS2 |
| The common main pathways for BRCA and MBCA | |||||
| 11 | Signal Transduction | 25 | ACVR1B, EGF, ERBB3, FLT1, FN1, GAB1, GAB2, GRB7, ITGAV, ITGB3, JAK1, NOTCH4, NR4A1, PARD3, PPARG, PRKCZ, PTEN, PTPN11, RUNX1, SMAD2, SMAD4, SMURF1, SYK, TFDP1, TGFBR2 | 25 | ACTB, AR, BDNF, BUB1B, CBFB, COL4A3, FOXA1, KDR, KPNA2, KRAS, NCOR1, NOS3, PDGFD, PIK3R1, PKN2, PLCB1, PRKCD, PRKCG, PRMT1, PTPRJ, RUNX1, STAG1, STAT1, WAS, YWHAE |
| 12 | Gene expression (Transcription) | 19 | ABL1, CBFB, CPSF1, CPSF3, MED23, NBN, NOTCH4, NR4A1, PCF11, POU2F1, PPARG, PTEN, PTPN11, RUNX1, SMAD2, SMAD4, SMARCA4, SMURF1, TFDP1 | 14 | AR, BDNF, CBFB, GPS2, HDAC3, KLF4, KRAS, NCOR1, PRMT1, RUNX1, SIN3A, STAT1, TBL1XR1, YWHAE |
The plausible driver genes involved in the proposed main pathways related to MBCA
| PPDMB | KGCC | KGBC | CGMC | CGMB | Specific main pathways | ||||
|---|---|---|---|---|---|---|---|---|---|
| Chromatin organization | Circadian Clock | Organelle biogenesis and maintenance | Neuronal System | Metabolism | |||||
| NCOR1 | 1 | 1 | #N/A | #N/A | ✓ | ✓ | ✓ | ✓ | |
| HDAC3a | #N/A | #N/A | #N/A | #N/A | ✓ | ✓ | ✓ | ✓ | |
| TBL1XR1 | 1 | 1 | 2 | #N/A | ✓ | ✓ | ✓ | ||
| SIRT4 | 1 | #N/A | #N/A | #N/A | ✓ | ||||
| ABATa | #N/A | #N/A | #N/A | #N/A | ✓ | ||||
| KRAS | 1 | 1 | 20 | #N/A | ✓ | ||||
| GRIN1a | #N/A | #N/A | #N/A | #N/A | ✓ | ||||
| PLCB1a | #N/A | #N/A | #N/A | #N/A | ✓ | ||||
| CACNA1E | 1 | #N/A | #N/A | #N/A | ✓ | ||||
| PRKCG | 1 | #N/A | 1 | #N/A | ✓ | ||||
| KPNA2a | #N/A | #N/A | #N/A | #N/A | ✓ | ||||
| GPS2 | 1 | 1 | #N/A | #N/A | ✓ | ✓ | |||
| SIN3A | 1 | #N/A | #N/A | #N/A | ✓ | ||||
| ACTB | 1 | #N/A | 1 | #N/A | ✓ | ||||
| KDM6B | #N/A | #N/A | 1 | #N/A | ✓ | ||||
| PRMT1 | #N/A | #N/A | 2 | #N/A | ✓ | ||||
PPDMB, proposed plausible drivers related to metastatic breast cancer; KGCC, known genes related to cancers that are confirmed in OMIM, CGC, and NCG; KGBC, known genes related to breast cancer that are confirmed in OMIM, CGC, and NCG; CGMC, confirmed genes related to different metastatic cancers in HCMDB; CGMB, confirmed genes related to metastatic breast cancer in HCMDB
aFive new genes that have not been already introduced in the public databases
Fig. 6Analysis plot of genomic alterations in 16 proposed genes for MBCA using cBioPortal
Fig. 7Mutations mapping on a linear protein and its domains using MutationMapper in cBioPortal. SMF (%) and the type of these somatic mutations for four genes are specified in the guideline box