| Literature DB >> 36172153 |
Weiyong Sheng1, Shouli Xia2, Yaru Wang2, Lizhao Yan3, Songqing Ke4, Evelyn Mellisa5, Fen Gong2, Yun Zheng6, Tiansheng Tang1.
Abstract
Background: Most studies of molecular subtype prediction in breast cancer were mainly based on two-dimensional MRI images, the predictive value of three-dimensional volumetric features from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) for predicting breast cancer molecular subtypes has not been thoroughly investigated. This study aimed to look into the role of features derived from DCE-MRI and how they could be combined with clinical data to predict invasive ductal breast cancer molecular subtypes.Entities:
Keywords: MRI; breast cancer; machine learning; molecular subtypes; radiomics; three-dimension
Year: 2022 PMID: 36172153 PMCID: PMC9510620 DOI: 10.3389/fonc.2022.964605
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 5.738
Figure 1Flowchart of study population with exclusion criteria.
Figure 2Multi-level step-by-step sketching of tumor lesions at contrast-enhanced T1-weighted MRI in a 49 year-old woman with invasive ductal cancer of the left breast. (A) sagittal position image shows an irregular shaped, irregular margined, heterogeneous enhancing mass (arrow). (B) transverse position image shows an irregular shaped, irregular margined, heterogeneous enhancing mass (arrow). (C) coronal position image shows an irregular shaped, irregular margined, heterogeneous enhancing mass (arrow). (D) the VOI fused into the outlined images step-by-step. (E) the pathological microscopic picture of invasive ductal cancer of the left breast.
Figure 3Flow chart of model establishment in this study. The 190 patients were grouped according to different molecular pressure types and divided into 3 categories, namely luminal and non-luminal, HER2-overexpressed and non-HER2-overexpressed, triple negative and non-triple negative. The following data were divided into training dataset and testing dataset. In the training dataset, the feature variables were screened by LASSO regression, and five machine learning models were used to construct the model. The performance evaluation of the model was carried out in the testing dataset to determine the optimal model.
Tumor characteristics.
| Characteristics | No. of breast cancers (n = 190) |
|---|---|
| Age (year) (mean ± sd) | 48.67 ± 10.03 |
| Menstrual status | |
| menopause | 77 (40.53) |
| no menopause | 113 (59.47) |
| Tumor size (mm) (mean ± sd) | 35.29 ± 24.23 mm |
| Tumor histological grade | |
| I | 51 (26.84) |
| II | 95 (50.00) |
| III | 44 (23.16) |
| TIC type | |
| I | 2 (1.05) |
| II | 68 (35.79) |
| III | 120 (63.16) |
| Axillary lymph node metastases | |
| yes | 113 (59.47) |
| no | 77 (40.53) |
| Molecular subtype | |
| Category 1 | |
| Luminal (A+B) | 99 (52.11) |
| Non-luminal | 91 (47.89) |
| Category 2 | |
| HER2-overexpressed | 59 (31.05) |
| Non-HER2-overexpressed | 131 (68.95) |
| Category 3 | |
| Triple-negative type | 32 (16.84) |
| Non-triple-negative type | 158 (83.16) |
TIC, Time Intensity Curve.
Figure 4Feature selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model in Luminal and Non-luminal group. (A) Tuning parameter selection in the LASSO model used 5-fold cross-validation via minimum criteria, (B) LASSO coefficient profiles of the baseline features.
Figure 5Feature selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model in HER2-overexpressed and Non-HER2-overexpressed group. (A) Tuning parameter selection in the LASSO model used 5-fold cross-validation via minimum criteria, (B) LASSO coefficient profiles of the baseline features.
Figure 6Feature selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model in Triple negative and Non-Triple negative group. (A) Tuning parameter selection in the LASSO model used 5-fold cross-validation via minimum criteria, (B) LASSO coefficient profiles of the baseline features.
Evaluation indicators of predictive performance of five models in Luminal group and Non-luminal group.
| Classifier | SEN | SPE | PRE | GM | FPR | F1 | ACC | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| LR | Training Dataset | 0.7524 | 0.7052 | 0.7293 | 0.7311 | 0.2948 | 0.7383 | 0.7282 | 0.7926 |
| Testing Dataset | 0.5556 | 0.7241 | 0.6522 | 0.6343 | 0.2759 | 0.6343 | 0.6429 | 0.7126 | |
| RF | Training Dataset | 0.8808 | 0.5895 | 0.6953 | 0.7163 | 0.4105 | 0.7759 | 0.7384 | 0.8523 |
| Testing Dataset | 0.7931 | 0.4815 | 0.6216 | 0.6180 | 0.5185 | 0.6180 | 0.6429 | 0.7708 | |
| NB | Training Dataset | 0.8907 | 0.4421 | 0.6277 | 0.6294 | 0.5579 | 0.7359 | 0.6717 | 0.7884 |
| Testing Dataset | 0.3333 | 0.8276 | 0.6429 | 0.5252 | 0.1724 | 0.5252 | 0.5893 | 0.7139 | |
| SVM | Training Dataset | 0.8427 | 0.6842 | 0.7422 | 0.7571 | 0.3158 | 0.7863 | 0.7641 | 0.8626 |
| Testing Dataset | 0.5926 | 0.7931 | 0.7273 | 0.6856 | 0.2069 | 0.6856 | 0.6964 | 0.7420 | |
| XGBoost | Training Dataset | 0.9571 | 0.5895 | 0.7139 | 0.7520 | 0.4105 | 0.8223 | 0.7846 | 0.9242 |
| Testing Dataset | 0.7524 | 0.6542 | 0.8524 | 0.7016 | 0.3458 | 0.6086 | 0.6964 | 0.8282 | |
LR, Logistic Regression; RF, Random Forest; NB, Naïve Bayes, SVM, Support Vector Machine; XGBoost, eXtreme Gradient Boosting; SEN, sensitivity; SPE, specificity, PRE, precision; GM, geometric mean; FPR, false positive rate; ACC, accuracy; AUC, area under ROC.
Evaluation indicators of predictive performance of five models in HER2-overexpressed and Non-HER2-overexpressed groups.
| Classifier | SEN | SPE | PRE | GM | FPR | F1 | ACC | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| LR | Training Dataset | 0.5000 | 0.7926 | 0.5232 | 0.5744 | 0.2074 | 0.5067 | 0.7026 | 0.7068 |
| Testing Dataset | 0.8462 | 0.3529 | 0.7500 | 0.5465 | 0.6471 | 0.7952 | 0.6964 | 0.7029 | |
| RF | Training Dataset | 0.3667 | 0.9704 | 0.9449 | 0.5782 | 0.0296 | 0.5105 | 0.7862 | 0.8065 |
| Testing Dataset | 0.2941 | 0.9744 | 0.8333 | 0.5353 | 0.0256 | 0.4348 | 0.7679 | 0.8054 | |
| NB | Training Dataset | 0.3833 | 0.8296 | 0.5958 | 0.5338 | 0.1704 | 0.4308 | 0.6923 | 0.6932 |
| Testing Dataset | 0.8718 | 0.3529 | 0.7556 | 0.5547 | 0.6471 | 0.8095 | 0.7143 | 0.7164 | |
| SVM | Training Dataset | 0.3667 | 0.8963 | 0.8269 | 0.5580 | 0.1037 | 0.4563 | 0.7333 | 0.7883 |
| Testing Dataset | 0.8974 | 0.4118 | 0.7778 | 0.6079 | 0.5882 | 0.8333 | 0.7500 | 0.7617 | |
| XGBoost | Training Dataset | 0.5167 | 0.9185 | 0.7972 | 0.6341 | 0.0815 | 0.6049 | 0.7949 | 0.7988 |
| Testing Dataset | 0.7949 | 0.6471 | 0.8378 | 0.7172 | 0.3529 | 0.8158 | 0.7500 | 0.7459 | |
RF, Random Forest; NB, Naïve Bayes; SVM, Support Vector Machine; XGBoost, eXtreme Gradient Boosting; SEN, sensitivity ;SPE, specificity; PRE, precision; GM, geometric mean; FPR, false positive rate; ACC, accuracy; AUC, area under ROC. LR, Logistic Regression.
Evaluation indicators of predictive performance of five models in Triple-negative group and Non-triple-negative group.
| Classifier | SEN | SPE | PRE | GM | FPR | F1 | ACC | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| LR | Training Dataset | 0.3043 | 0.9459 | 0.5344 | 0.7143 | 0.0541 | 0.3867 | 0.8358 | 0.7773 |
| Testing Dataset | 0.9149 | 0.3333 | 0.8776 | 0.5522 | 0.6667 | 0.8958 | 0.8214 | 0.7069 | |
| RF | Training Dataset | 0.1217 | 1.0000 | 1.0000 | 0.5958 | 0 | 0.2156 | 0.8493 | 0.8722 |
| Testing Dataset | 0.1111 | 1.0000 | 1.0000 | 0.3333 | 0 | 0.2000 | 0.8571 | 0.7979 | |
| NB | Training Dataset | 0.3217 | 0.9027 | 0.4068 | 0.7057 | 0.0973 | 0.3576 | 0.8030 | 0.7451 |
| Testing Dataset | 0.8723 | 0.3333 | 0.8723 | 0.5392 | 0.6667 | 0.8723 | 0.7857 | 0.6809 | |
| SVM | Training Dataset | 0.1565 | 1.0000 | 1.0000 | 0.6069 | 0 | 0.2691 | 0.8552 | 0.8743 |
| Testing Dataset | 1.0000 | 0.2222 | 0.8704 | 0.4714 | 0.7778 | 0.9307 | 0.8750 | 0.7778 | |
| XGBoost | Training Dataset | 0.4000 | 0.9730 | 0.7555 | 0.7669 | 0.027 | 0.5210 | 0.8746 | 0.9260 |
| Testing Dataset | 0.9362 | 0.4444 | 0.8980 | 0.6450 | 0.5556 | 0.9167 | 0.8571 | 0.9031 | |
LR, Logistic Regression; RF, Random Forest; NB, Naïve Bayes; SVM, Support Vector Machine; XGBoost, eXtreme Gradient Boosting; SEN, sensitivity; SPE, specificity; PRE, precision; GM, geometric mean; FPR, false positive rate; ACC, accuracy; AUC. area under ROC.