| Literature DB >> 35953588 |
Manfred Musigmann1, Burak Han Akkurt1, Hermann Krähling1, Nabila Gala Nacul1, Luca Remonda2,3, Thomas Sartoretti4, Dylan Henssen5, Benjamin Brokinkel6, Walter Stummer6, Walter Heindel1, Manoj Mannil7.
Abstract
To investigate the applicability and performance of automated machine learning (AutoML) for potential applications in diagnostic neuroradiology. In the medical sector, there is a rapidly growing demand for machine learning methods, but only a limited number of corresponding experts. The comparatively simple handling of AutoML should enable even non-experts to develop adequate machine learning models with manageable effort. We aim to investigate the feasibility as well as the advantages and disadvantages of developing AutoML models compared to developing conventional machine learning models. We discuss the results in relation to a concrete example of a medical prediction application. In this retrospective IRB-approved study, a cohort of 107 patients who underwent gross total meningioma resection and a second cohort of 31 patients who underwent subtotal resection were included. Image segmentation of the contrast enhancing parts of the tumor was performed semi-automatically using the open-source software platform 3D Slicer. A total of 107 radiomic features were extracted by hand-delineated regions of interest from the pre-treatment MRI images of each patient. Within the AutoML approach, 20 different machine learning algorithms were trained and tested simultaneously. For comparison, a neural network and different conventional machine learning algorithms were trained and tested. With respect to the exemplary medical prediction application used in this study to evaluate the performance of Auto ML, namely the pre-treatment prediction of the achievable resection status of meningioma, AutoML achieved remarkable performance nearly equivalent to that of a feed-forward neural network with a single hidden layer. However, in the clinical case study considered here, logistic regression outperformed the AutoML algorithm. Using independent test data, we observed the following classification results (AutoML/neural network/logistic regression): mean area under the curve = 0.849/0.879/0.900, mean accuracy = 0.821/0.839/0.881, mean kappa = 0.465/0.491/0.644, mean sensitivity = 0.578/0.577/0.692 and mean specificity = 0.891/0.914/0.936. The results obtained with AutoML are therefore very promising. However, the AutoML models in our study did not yet show the corresponding performance of the best models obtained with conventional machine learning methods. While AutoML may facilitate and simplify the task of training and testing machine learning algorithms as applied in the field of neuroradiology and medical imaging, a considerable amount of expert knowledge may still be needed to develop models with the highest possible discriminatory power for diagnostic neuroradiology.Entities:
Mesh:
Year: 2022 PMID: 35953588 PMCID: PMC9366823 DOI: 10.1038/s41598-022-18028-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Convexity meningioma of the left hemisphere (above); semi-automatic segmentation with 3D Slicer (below).
Clinical and demographic data.
| Training data | Independent test data | |
|---|---|---|
| Number | 111 | 27 |
| GTR | 77.48 | 77.78 |
| STR | 22.52 | 22.22 |
| Mean age (years) | 58.80 | 59.12 |
| Male | 27.93 | 25.93 |
| Female | 72.07 | 74.07 |
Figure 2Development and test of a model with 100 repetitions (c = 100 cycles), fixed number of features und a fixed machine learning algorithm used for feature preselection and for the subsequent model estimation.
Univariate results.
| Variable | GRT (n = 107) | STR (n = 31) | |
|---|---|---|---|
| 1tumor location = skull base | 38 (35.5%) | 26 (83.9%) | 0.00001* |
| 1tumor location = convexity | 46 (43.0%) | 0 (0.0%) | 0.00002* |
| 1tumor shape = irregular | 31 (29.0%) | 21 (67.7%) | 0.00021* |
| 1fd_vs_re = rezidiv | 6 (5.6%) | 9 (29.0%) | 0.00077* |
| 1Tumor location = falx | 17 (15.9%) | 0 (0.0%) | 0.03942* |
| 2orig.glszm.Smallareaemphasis | 0.578 (0.388, 0.750) | 0.727 (0.529, 0.813) | 0.04842* |
| 1KPI (50/60/70/80/0/100) | 3/0/5/30/50/19 | 1/2/2/13/11/2 | 0.04638* |
| 2orig.shape.Elongation | 1.000 (1.000, 1.000) | 1.000 (0.791, 1.000) | 0.05904 |
| 2orig.shape.MajorAxisLength | 3.068 (2.000, 3.266) | 3.266 (2.530, 3.266) | 0.06798 |
| 2orig.glszm.ZoneEntropy | 1,585 (1.500, 2.322) | 2.000 (1.585, 2.322) | 0.09144 |
1Binary and categorical features: number n (in %), 2continuous variables: median (interquartile range).
*p-value < 0.05 (assumed to be statistically significant).
Figure 3Performance results for models built with Automated Machine Learning (AutoML). Left figure: Area Under the Curve (AUC) and accuracy. Right figure: specificity and sensitivity. All values calculated as means of 100 repetitions (100 cycles) using independent test data. Sort metric used to specify the best model in each cycle: Log Loss (red and black lines) and AUC (blue and green lines).
Figure 4Selected final model class. GBM Gradient Boosting Machine, GLM Generalized Linear Model, DeepLearning = Fully-connected multi-layer artificial neural network, XRT Extremely Randomized Trees, DRF Distributed Random Forest.
Figure 5Area Under the Curve (AUC) for AutoML (red lines), neural network (pink lines) and conventional machine learning models (other lines). Left figure: training data. Right figure: independent test data. All values calculated as means of 100 repetitions (100 cycles).
Figure 6Accuracy for AutoML (red lines), neural network (pink lines) and conventional machine learning models (other lines). Left figure: training data. Right figure: independent test data. All values calculated as means of 100 repetitions (100 cycles).
Figure 7Relative Loss of Performance (RLP) for AutoML (red lines), neural network (pink lines) and conventional machine learning models (other lines). Left figure: AUC. Right figure: accuracy. All values calculated as means of 100 repetitions (100 cycles).
Classification results for AutoML (using 4 and 7 features), neural network (using 6 features) and logistic regression (using 3 features).
| AutoML (4 features) | AutoML (7 features) | Neural network | Logistic regression | |
|---|---|---|---|---|
| AUC | 0.847 [0.642, 0.975] | 0.849 [0.675, 0.978] | 0.879 [0.716, 0.984] | 0.900 [0.786, 0.976] |
| Accuracy | 0.814 [0.667, 0.926] | 0.821 [0.704, 0.926] | 0.839 [0.704, 0.944] | 0.881 [0.778, 0.963] |
| Kappa | 0.450 [− 0.013, 0.786] | 0.465 [0.087, 0.757] | 0.491 [0.000, 0.847] | 0.644 [0.348, 0.899] |
| Sensitivity | 0.565 [0.000, 1.000] | 0.578 [0.167, 1.000] | 0.577 [0.000, 1.000] | 0.692 [0.333, 1.000] |
| Specificity | 0.885 [0.619, 1.000] | 0.891 [0.737, 1.000] | 0.914 [0.786, 1.000] | 0.936 [0.857, 1.000] |
All values calculated as means of 100 repetitions (100 cycles) using independent test data. Values in brackets: 95% confidence interval.