| Literature DB >> 33809057 |
Denis Krajnc1, Laszlo Papp1, Thomas S Nakuz2, Heinrich F Magometschnigg3, Marko Grahovac2,4, Clemens P Spielvogel2,4, Boglarka Ecsedi1, Zsuzsanna Bago-Horvath5, Alexander Haug2,4, Georgios Karanikas2, Thomas Beyer1, Marcus Hacker2, Thomas H Helbich3, Katja Pinker3,6.
Abstract
Background: This study investigated the performance of ensemble learning holomic models for the detection of breast cancer, receptor status, proliferation rate, and molecular subtypes from [18F]FDG-PET/CT images with and without incorporating data pre-processing algorithms. Additionally, machine learning (ML) models were compared with conventional data analysis using standard uptake value lesion classification.Entities:
Keywords: PET/CT; breast cancer; data pre-processing; machine learning; radiomics; triple negative
Year: 2021 PMID: 33809057 PMCID: PMC8000810 DOI: 10.3390/cancers13061249
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1The analysis workflow of the collected dataset. Prospective study conducted between 2009 and 2014, approved by the institutional review board provided data records for 170 patients. [18F]FDG-PET/CT of the breast was performed with a dedicated breast imaging protocol using a combined whole-body PET/CT system. 173 lesions were delineated and extracted following the imaging biomarker standardization initiative (IBSI) guidelines combined with optimized feature extraction principles. Feature redundancy reduction was performed resulting in 77 features. Monte Carlo cross validation was utilized to generate 100 training vs. validation folds. Pre-processing steps were performed over training data. Ensemble learning scheme was utilized to establish predictive models. All machine learning models underwent confusion matrix analytics, sham data analysis, and Area Under the Receiver Operator Characteristics Curve (AUC) analysis across MC folds and the conventional PET SUV analysis. VOI = Volume of Interest; BMI = Body Mass Index; ER = Estrogen; PR = Progesterone; HER2 = Human Epidermal Growth Receptor 2; PET–Positron Emission Tomography; CT–Computed Tomography.
Figure 218F-fluorodeoxyglucose positron emission tomography/computed tomography ([18F]FDG-PET/CT) view of a breast cancer patient with semi-automatically delineated volume of interest (VOI) in the PET image. Windowing: hot iron palette with SUV body weight (SUVbw) of 6.5 for PET and range of −100 to 200 Hounsfield units (HU) for CT. The patient underwent imaging procedure in prone position and view is shown following the radiological convention.
Patient cohort characteristics for malignancy, estrogen (ER), progesterone (PR), human epidermal growth receptor 2 (HER2), Ki-67 protein expression, triple negative, and luminal A/B status. NA = Not Available.
| Patient Characteristics (n = 170) | Value |
|---|---|
| Age (years), median (IQR) | 57.6 (18–86) |
| Lesion volume (cm3), median (IQR) | 12.8 (6.2–26.9) |
| Malignancy | n (%) |
| Malignant | 132 (78) |
| Benign | 38 (22) |
| Estrogen (ER) | n (%) |
| − | 17 (10) |
| + | 88 (52) |
| NA | 65 (38) |
| Progesterone (PR) | n (%) |
| − | 27 (16) |
| + | 78 (46) |
| NA | 65 (38) |
| Ki-67 | n (%) |
| − | 26 (15) |
| + | 73 (43) |
| NA | 71 (42) |
| HER2 | n (%) |
| − | 84 (49) |
| + | 22 (13) |
| NA | 64 (38) |
| Triple negative | n (%) |
| Yes | 11 (6) |
| No | 95 (56) |
| NA | 64 (38) |
| Luminal A/B | n (%) |
| A | 14 (8) |
| B | 81 (48) |
| NA | 75 (44) |
Figure 3Performance comparison of breast cancer detection machine learning (ML) predictive models, with and without data pre-processing. ACC = Accuracy; SENS = Sensitivity; SPEC = Specificity; NPV = Negative Predictive Value; PPV = Positive Predictive Value. Performance is expressed in percentages (%).
Figure 4Performance comparison of triple negative subtype machine learning (ML) predictive models, with and without data pre-processing. ACC = Accuracy; SENS = Sensitivity; SPEC = Specificity; NPV = Negative Predictive Value; PPV = Positive Predictive Value. Performance is expressed in percentages (%).
Monte Carlo cross-validation performance of all ensemble predictive models with and without data preparation. Confusion matrix values are expressed in percentages (%). AUC is expressed in ratio.
| Model | Data Preprocessing | SENS | SPEC | NPV | PPV | ACC | AUC |
|---|---|---|---|---|---|---|---|
|
| No | 83 | 40 | 70 | 58 | 62 | 0.63 |
| Yes | 82 | 56 | 78 | 65 | 69 | 0.68 | |
|
| No | 74 | 36 | 58 | 54 | 55 | 0.56 |
| Yes | 78 | 35 | 61 | 54 | 56 | 0.55 | |
|
| No | 68 | 39 | 55 | 53 | 53 | 0.63 |
| Yes | 65 | 45 | 56 | 54 | 55 | 0.65 | |
|
| No | 17 | 84 | 50 | 51 | 50 | 0.46 |
| Yes | 17 | 84 | 50 | 51 | 50 | 0.46 | |
|
| No | 17 | 87 | 51 | 57 | 52 | 0.62 |
| Yes | 16 | 89 | 51 | 59 | 53 | 0.52 | |
|
| No | 57 | 94 | 68 | 90 | 75 | 0.76 |
| Yes | 85 | 78 | 84 | 79 | 82 | 0.82 | |
|
| No | 80 | 59 | 75 | 66 | 69 | 0.71 |
| Yes | 80 | 78 | 79 | 78 | 80 | 0.81 |
ACC = Accuracy, AUC = Area under the receiver operator characteristic curve, SENS = Sensitivity, SPEC = Specificity, NPV = Negative Predictive Value, PPV = Positive Predictive Value, ER = Estrogen, HER2 = Human Epidermal Growth Receptor 2, PR = Progesterone. Sign ↑ indicates performance increase in pre-processed training datasets compared to original datasets.
Figure 5Occurrence of high-ranking features across the 100 Monte Carlo folds in cancer detection predictive model. NGTDM = neighborhood grey tone difference matrix; GLSZM = gray level size zone matrix; GLCM = gray level co-occurrence matrix; SUVmax = maximum standard uptake value; SUVmean = mean standard uptake value; SUVmin = minimal standard uptake value; skew = skewness; z.perc = zone percentage; entr = entropy; info.corr.1 = information correlation 1; joint.max = joint maximum; lze = large zone emphasis; kurt = kurtosis; corr, correlation; joint.entr = joint entropy; inv.diff = inversed difference.
Figure 6Occurrence of high-ranking features across the 100 Monte Carlo folds in triple negative predictive model. NGTDM = neighborhood grey tone difference matrix; GLCM = gray level co-occurrence matrix; GLSZM = gray level size zone matrix; SUVmax = maximum standard uptake value; SUVmean = mean standard uptake value; kurt = kurtosis; sum.avg = sum average; diff.entr = difference entropy; clust.shade = cluster shade; sum.entr = sum entropy; lzhge = large zone high grey level emphasis; clust.prom = cluster prominence; skew = skewness; info.corr.1 = information correlation 1.
Figure 7Comparison of area under the receiver operator characteristics curve (AUC) performance of maximum standard uptake value (SUVmax) and holomics-based ensemble models with and without data pre-processing for (a) breast cancer detection (b) triple negative subtype.