| Literature DB >> 35962027 |
Corrado Pancotti1, Giovanni Birolo2, Cesare Rollo1, Tiziana Sanavia1, Barbara Di Camillo3, Umberto Manera4, Adriano Chiò4, Piero Fariselli1.
Abstract
Amyotrophic lateral sclerosis (ALS) is a highly complex and heterogeneous neurodegenerative disease that affects motor neurons. Since life expectancy is relatively low, it is essential to promptly understand the course of the disease to better target the patient's treatment. Predictive models for disease progression are thus of great interest. One of the most extensive and well-studied open-access data resources for ALS is the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) repository. In 2015, the DREAM-Phil Bowen ALS Prediction Prize4Life Challenge was held on PRO-ACT data, where competitors were asked to develop machine learning algorithms to predict disease progression measured through the slope of the ALSFRS score between 3 and 12 months. However, although it has already been successfully applied in several studies on ALS patients, to the best of our knowledge deep learning approaches still remain unexplored on the ALSFRS slope prediction in PRO-ACT cohort. Here, we investigate how deep learning models perform in predicting ALS progression using the PRO-ACT data. We developed three models based on different architectures that showed comparable or better performance with respect to the state-of-the-art models, thus representing a valid alternative to predict ALS disease progression.Entities:
Mesh:
Year: 2022 PMID: 35962027 PMCID: PMC9374680 DOI: 10.1038/s41598-022-17805-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1(a) Convolutional neural network architecture. The dynamic features of the questionnaire dataset flows into the convolutional module constituted by two layers; after the concatenation with the static features, the information is fed into a feed-forward neural network to predict the two outcomes. (b) Recurrent Neural Network architecture. The structure is the same as the convolutional architecture except for the Recurrent module that processes the dynamic features.
ALSFRS/ALSFRS-R cohort descriptive statistics. For each variable, the number and percentage of non missing values within the dataset before imputation, as well as and their percentage or median values are reported. Total ALSFRS and weight refer to the first available observation.
| Data | Count (observed rate) | Percentage/median (iqr) |
|---|---|---|
| Age | 2581 (88.4%) | 55 (46–63) years |
| Sex | 2921 (100%) | 63.4% males |
| Height | 2577 (88.2%) | 171 (164–178) cm |
| Caucasian | 2870 (98.2%) | 95.5% |
| Weight (first) | 2715 (92.9%) | 76.0 (65.0–86.4) |
| Time of onset | 2864 (98%) | 558 (367–819) days |
| Spinal; Bulbar; Both Spinal and Bulbar; Others | 2534 (86.7%) | 67.8%; 20.8%; 0.7% 10.7% |
| Total ALSFRS (first) | 2921 (100%) | 32 (28–35) |
| Riluzole use | 2505 (85.7%) | 73.7% yes |
Figure 2Distribution of the 3–12 months ALSFRS slope distribution and fast versus medium-slow progressors.
Figure 3Feature importance. Top 20 most important features ranked by Random Forest via cross-validation on the training set, using the Gini criterion.
ALSFRS slope prediction performance. RMSD and PCC are shown for all methods. FFNN, CNN, and RNN performance was obtained on the external test set (n = 731) using 10,000 bootstrap with resampling. For these predictors we reported the mean and the 95% confidence interval (CI). FFNN+CNN represents the ensemble prediction of the two neural networks. The best values for each metric are highlighted in bold. *Random Forest (RF) and Bayesian Additive Regression Trees (BART) are taken from literature[31], where they are reported without CI.
| Methods | RMSD | PCC |
|---|---|---|
| FFNN | 0.528 (0.502–0.555) | 0.451 (0.404–0.495) |
| CNN | 0.527 (0.499–0.556) | 0.439 (0.388–0.487) |
| RNN | 0.529 (0.501–0.558) | 0.429 (0.379–0.476) |
| FFNN+CNN | 0.462 (0.415–0.508) | |
| RF* | 0.563 | 0.446 |
| BART* | 0.554 |
Figure 4Shapley values for the FFNN architecture; x-axis: the impact on the model output, y-axis: the top 20 most predictive features. The colormap represents the feature values.
ALSFRS slope performance with most important features. RMSD and PCC are shown for all methods. Performance was obtained on the external test set (n = 731) using 10,000 bootstrap with resampling. FFNN was trained using the top 5 features, CNN and RNN were trained using ALSFRS questionnaire data and the top 4 features. The best values for each metrics are in bold.
| Methods | RMSD | PCC |
|---|---|---|
| FFNN | 0.534 (0.506–0.564) | 0.414 (0.367–0.458) |
| CNN | ||
| RNN | 0.544 (0.514–0.574) | 0.375 (0.328–0.419) |
Assessing the impact of the missing values imputation on ALSFRS slope performance with the top 5 most important features. RMSD and PCC are presented for the two FFNN methods: FFNN was trained on imputed n = 2338 patients, while FFNN on n = 1748 non-imputed ones. Both models were then tested on n = 583 non-imputed examples.
| Methods | RMSD | PCC |
|---|---|---|
| FFNN | 0.547 (0.512–0.582) | 0.416 (0.359–0.472) |
| FFNN | 0.546 (0.511–0.581) | 0.415 (0.354–0.475) |
Figure 5Slope and survival. Left: Scatterplot of the experimental slope between months 3 and 12 against the time-to-death of 458 patients. Right: Kaplan–Meier curves for fast and medium-slow progressing patients in the test set. Fast and medium-slow progressors were 21 and 105, respectively. Times start from month 12 after the first visit.