| Literature DB >> 32050984 |
Ottavia Spiga1, Vittoria Cicaloni2,3, Cosimo Fiorini4, Alfonso Trezza2, Anna Visibelli2,5, Lia Millucci2, Giulia Bernardini2, Andrea Bernini2, Barbara Marzocchi2,6, Daniela Braconi2, Filippo Prischi7, Annalisa Santucci2.
Abstract
BACKGROUND: Alkaptonuria (AKU) is an ultra-rare autosomal recessive disease caused by a mutation in the homogentisate 1,2-dioxygenase (HGD) gene. One of the main obstacles in studying AKU, and other ultra-rare diseases, is the lack of a standardized methodology to assess disease severity or response to treatment. Quality of Life scores (QoL) are a reliable way to monitor patients' clinical condition and health status. QoL scores allow to monitor the evolution of diseases and assess the suitability of treatments by taking into account patients' symptoms, general health status and care satisfaction. However, more comprehensive tools to study a complex and multi-systemic disease like AKU are needed. In this study, a Machine Learning (ML) approach was implemented with the aim to perform a prediction of QoL scores based on clinical data deposited in the ApreciseKUre, an AKU- dedicated database.Entities:
Keywords: Alkaptonuria; Machine learning; Precision medicine; QoL scores; Rare disease
Mesh:
Year: 2020 PMID: 32050984 PMCID: PMC7017449 DOI: 10.1186/s13023-020-1305-0
Source DB: PubMed Journal: Orphanet J Rare Dis ISSN: 1750-1172 Impact factor: 4.123
Fig. 1Machine learning framework. A 4-steps workflow of the machine learning-based classification model
Fig. 2Correlation matrix of health survey questionnaires. In this correlation matrix all QoL scores are correlated to each other. In black statistically significant inverse correlation, in light-pink statistically significant direct correlation, in red or purple not statistically significant correlations
Fig. 3Variable importance Xgboost for each QoL score. In the matrix are reported all the most representative indicators (X axes) with respect to Qol scores (Y axes) for scores prediction with their corresponding variable importance. Color scale goes from the lower value (in black) to highest value (light pink)
ML algorithm performance comparison
| Model | RAE | R2 |
|---|---|---|
| Linear Regression | 0.34 | 0.87 |
| Neural networks | 0.28 | 0.91 |
| k-NN | 0.25 | 0.94 |
Comparison based on RAE and R2 score among different ML models. K-NN resulted to have the lowest RAE, thus the best performance
Fig. 4Performance for each QoL Score. Representation of model accuracy (RAE) for each QoL score, scale from the lower value (in light green) to highest value (blue)
Correlation matrix of original and surrogate dataset
| ORIGINAL | Pearson correlation coefficient | ||||||
| Variables | SAA | CHIT1 | AOPP | RSSP | age | BMI | |
| SAA | 1.00 | −0.01 | − 0.01 | 0.15 | 0.02 | 0.23 | |
| CHIT1 | −0.01 | 1.00 | 0.00 | 0.28 | 0.40* | −0.01 | |
| AOPP | −0.01 | 0.00 | 1.00 | 0.06 | 0.09 | 0.17 | |
| RSSP | 0.15 | 0.28 | 0.06 | 1.00 | 0.38* | 0.09 | |
| Age | 0.02 | 0.40* | 0.09 | 0.38* | 1.00 | 0.14 | |
| BMI | 0.23 | −0.01 | 0.17 | 0.09 | 0.14 | 1.00 | |
| Variables | SAA | CHIT1 | AOPP | RSSP | age | BMI | |
| SAA | 0.00 | 0.56 | 1.00 | 0.11 | 0.57 | 0.01 | |
| CHIT1 | 0.56 | 0.00 | 0.87 | 0.00 | 0.00 | 0.86 | |
| AOPP | 1.00 | 0.87 | 0.00 | 0.69 | 0.45 | 0.10 | |
| RSSP | 0.11 | 0.00 | 0.69 | 0.00 | 0.00 | 0.59 | |
| Age | 0.57 | 0.00 | 0.45 | 0.00 | 0.00 | 0.28 | |
| BMI | 0.01 | 0.86 | 0.10 | 0.59 | 0.28 | 0.00 | |
| SURROGATE | Pearson correlation coefficient | ||||||
| Variables | SAA | CHIT1 | AOPP | RSSP | age | BMI | |
| SAA | 1.00 | −0.16 | 0.02 | 0.22 | −0.02 | −0.16 | |
| CHIT1 | −0.16 | 1.00 | −0.03 | −0.06 | −0.08 | 0.06 | |
| AOPP | 0.02 | −0.03 | 1.00 | −0.12 | 0.06 | −0.01 | |
| RSSP | 0.22 | −0.06 | −0.12 | 1.00 | −0.18 | 0.09 | |
| Age | −0.02 | −0.08 | 0.06 | −0.18 | 1.00 | −0.10 | |
| BMI | −0.16 | 0.06 | −0.01 | 0.09 | −0.10 | 1.00 | |
| Variables | SAA | CHIT1 | AOPP | RSSP | age | BMI | |
| SAA | 0.00 | 0.72 | 1.00 | 0.23 | 0.57 | 0.10 | |
| CHIT1 | 0.72 | 0.00 | 0.88 | 0.02 | 0.00 | 1.00 | |
| AOPP | 1.00 | 0.88 | 0.00 | 0.66 | 0.61 | 0.20 | |
| RSSP | 0.23 | 0.02 | 0.66 | 0.00 | 0.02 | 0.58 | |
| Age | 0.57 | 0.00 | 0.61 | 0.02 | 0.00 | 0.28 | |
| BMI | 0.10 | 1.00 | 0.20 | 0.58 | 0.28 | 0.00 | |
The first table shows the Pearson correlations coefficients and the p-values of our original dataset, the second table shows the Pearson correlations coefficients and the p-values of surrogate dataset
*indicates statistically significant values
Fig. 5Surrogate Test Analysis. Comparison of performance based on RAE values, between k-NNs trained on surrogate data (red) and original dataset (blue)