| Literature DB >> 36110986 |
Danlei Ru1, Jinchen Li2,3,4, Ouyi Xie1, Linliu Peng4, Hong Jiang2,4,5,6,7, Rong Qiu1.
Abstract
Existing treatments can only delay the progression of spinocerebellar ataxia type 3/Machado-Joseph disease (SCA3/MJD) after onset, so the prediction of the age at onset (AAO) can facilitate early intervention and follow-up to improve treatment efficacy. The objective of this study was to develop an explainable artificial intelligence (XAI) based on feature optimization to provide an interpretable and more accurate AAO prediction. A total of 1,008 affected SCA3/MJD subjects from mainland China were analyzed. The expanded cytosine-adenine-guanine (CAG) trinucleotide repeats of 10 polyQ-related genes were genotyped and included in related models as potential AAO modifiers. The performance of 4 feature optimization methods and 10 machine learning (ML) algorithms were compared, followed by building the XAI based on the SHapley Additive exPlanations (SHAP). The model constructed with an artificial neural network (ANN) and feature optimization of Crossing-Correlation-StepSVM performed best and achieved a coefficient of determination (R2) of 0.653 and mean absolute error (MAE), root mean square error (RMSE), and median absolute error (MedianAE) of 4.544, 6.090, and 3.236 years, respectively. The XAI explained the predicted results, which suggests that the factors affecting the AAO were complex and associated with gene interactions. An XAI based on feature optimization can improve the accuracy of AAO prediction and provide interpretable and personalized prediction.Entities:
Keywords: CAG repeats; age at onset; explainable artificial intelligence (XAI); feature optimization; machine learning; spinocerebellar ataxia type 3
Year: 2022 PMID: 36110986 PMCID: PMC9468717 DOI: 10.3389/fninf.2022.978630
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 3.739
Figure 1Flowchart of the StepSVM.
Figure 2Flowchart of the study.
Results of the descriptive statistical analysis.
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| ATXN3-A1 | 19.65 ± 6.24 | 11/36 | 19.74 ± 6.27 | 11/36 | 19.31 ± 6.08 | 11/35 |
| ATXN3-A2 | 71.80 ± 3.32 | 61/80 | 71.76 ± 3.30 | 61/80 | 71.97 ± 3.39 | 61/80 |
| ATXN3-D | 52.15 ± 7.07 | 31/68 | 51.99 ± 7.13 | 31/68 | 52.78 ± 6.81 | 32/66 |
| ATXN3-M | 45.73 ± 3.53 | 37.5/55.5 | 45.75 ± 3.54 | 37.5/55.5 | 45.63 ± 3.51 | 37.5/54 |
| ATXN1-A1 | 27.08 ± 1.87 | 15/31 | 27.11 ± 1.77 | 15/31 | 26.94 ± 2.20 | 15/31 |
| ATXN1-A2 | 29.31 ± 1.59 | 26/39 | 29.35 ± 1.62 | 26/39 | 29.17 ± 1.46 | 26/33 |
| ATXN1-D | 2.24 ± 1.95 | 0/16 | 2.24 ± 1.95 | 0/16 | 2.23 ± 1.96 | 0/11 |
| ATXN1-M | 28.20 ± 1.43 | 20.5/33.5 | 28.23 ± 1.39 | 23/33.5 | 28.05 ± 1.59 | 20.5/31 |
| ATXN2-A1 | 21.51 ± 1.07 | 11/22 | 21.52 ± 1.09 | 11/22 | 21.50 ± 0.97 | 17/22 |
| ATXN2-A2 | 21.93 ± 1.40 | 20/33 | 21.95 ± 1.41 | 20/31 | 21.85 ± 1.35 | 20/33 |
| ATXN2-D | 0.42 ± 1.51 | 0/12 | 0.44 ± 1.56 | 0/12 | 0.31 ± 1.25 | 0/11 |
| ATXN2-M | 21.72 ± 0.97 | 16/26.5 | 21.74 ± 1.00 | 16/26.5 | 21.63 ± 0.84 | 19.5/26 |
| CACNA1A-A1 | 11.38 ± 2.22 | 4/16 | 11.36 ± 2.22 | 4/16 | 11.46 ± 2.20 | 5/15 |
| CACNA1A-A2 | 13.24 ± 1.17 | 5/17 | 13.24 ± 1.16 | 5/17 | 13.26 ± 1.21 | 7/17 |
| CACNA1A-D | 1.87 ± 2.02 | 0/11 | 1.88 ± 2.01 | 0/11 | 1.84 ± 2.06 | 0/9 |
| CACNA1A-M | 12.31 ± 1.46 | 5/16 | 12.31 ± 1.46 | 5/16 | 12.33 ± 1.47 | 7/15.5 |
| ATXN7-A1 | 10.29 ± 1.18 | 6/18 | 10.30 ± 1.16 | 6/18 | 10.23 ± 1.24 | 6/13 |
| ATXN7-A2 | 10.97 ± 1.25 | 7/21 | 10.95 ± 1.17 | 7/18 | 11.04 ± 1.52 | 8/21 |
| ATXN7-D | 0.68 ± 1.14 | −2/11 | 0.65 ± 1.06 | −2/6 | 0.81 ± 1.40 | 0/11 |
| ATXN7-M | 10.63 ± 1.07 | 7/18 | 10.62 ± 1.03 | 7/18 | 10.66 ± 1.22 | 7/15.5 |
| TBP-A1 | 27.68 ± 1.35 | 20/31 | 27.69 ± 1.38 | 20/31 | 27.63 ± 1.23 | 21/30 |
| TBP-A2 | 29.00 ± 1.22 | 25/35 | 29.03 ± 1.23 | 26/35 | 28.89 ± 1.17 | 25/34 |
| TBP-D | 1.32 ± 1.49 | −1/9 | 1.34 ± 1.49 | −1/9 | 1.23 ± 1.46 | 0/8 |
| TBP-M | 28.34 ± 1.05 | 23/32.5 | 28.37 ± 1.08 | 23/32.5 | 28.23 ± 0.92 | 25/30.5 |
| HTT-A1 | 18.81 ± 1.59 | 12/25 | 18.79 ± 1.61 | 12/25 | 18.88 ± 1.52 | 12/23 |
| HTT-A2 | 20.89 ± 2.21 | 16/30 | 20.91 ± 2.25 | 16/30 | 20.79 ± 2.00 | 18/29 |
| HTT-D | 2.06 ± 2.14 | 0/11 | 2.12 ± 2.16 | 0/11 | 1.85 ± 2.05 | 0/10 |
| HTT-M | 19.82 ± 1.57 | 15/25.5 | 19.83 ± 1.59 | 15/25.5 | 19.80 ± 1.49 | 16/24.5 |
| ATN1-A1 | 17.16 ± 2.77 | 9/24 | 17.14 ± 2.75 | 10/24 | 17.22 ± 2.83 | 9/22 |
| ATN1-A2 | 20.46 ± 2.59 | 12/46 | 20.39 ± 2.31 | 12/35 | 20.72 ± 3.47 | 14/46 |
| ATN1-D | 3.30 ± 2.89 | 0/28 | 3.23 ± 2.65 | 0/15 | 3.56 ± 3.69 | 0/28 |
| ATN1-M | 18.81 ± 2.26 | 12/33 | 18.77 ± 2.17 | 12/27.5 | 18.97 ± 2.58 | 14/33 |
| KCNN3-A1 | 18.11 ± 1.65 | 10/21 | 18.13 ± 1.60 | 10/21 | 18.05 ± 1.85 | 10/20 |
| KCNN3-A2 | 19.60 ± 1.32 | 13/30 | 19.57 ± 1.23 | 13/30 | 19.72 ± 1.61 | 14/30 |
| KCNN3-D | 1.51 ± 1.80 | 0/16 | 1.47 ± 1.70 | 0/12 | 1.66 ± 2.14 | 0/16 |
| KCNN3-M | 19.06 ± 1.22 | 13/25 | 19.06 ± 1.17 | 13/25 | 19.06 ± 1.40 | 13/25 |
| RAI1-A1 | 11.50 ± 0.77 | 7/13 | 11.49 ± 0.77 | 7/13 | 11.56 ± 0.75 | 9/13 |
| RAI1-A2 | 12.11 ± 0.39 | 10/14 | 12.10 ± 0.40 | 10/14 | 12.13 ± 0.39 | 10/14 |
| RAI1-D | 0.62 ± 0.78 | 0/5 | 0.63 ± 0.79 | 0/5 | 0.59 ± 0.75 | 0/3 |
| RAI1-M | 11.93 ± 0.52 | 9/13 | 11.93 ± 0.53 | 9/13 | 11.95 ± 0.52 | 10/13 |
Result of the Pearson correlation coefficient.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Sex | −0.0121 | 0.7340 |
|
|
|
| ATXN3-A1 | 0.0223 | 0.5289 | TBP-A2 | −0.0452 | 0.2027 |
|
|
|
| TBP-D | 0.0540 | 0.1275 |
|
|
|
|
|
|
|
|
|
|
| HTT-A1 | −0.0055 | 0.8758 |
| ATXN1-A1 | 0.0042 | 0.9061 | HTT-A2 | 0.0180 | 0.6122 |
|
|
|
| HTT-D | 0.0232 | 0.5130 |
| ATXN1-D | 0.0452 | 0.2027 | HTT-M | 0.0092 | 0.7960 |
| ATXN1-M | 0.0370 | 0.2963 | ATN1-A1 | 0.0403 | 0.2560 |
|
|
|
| ATN1-A2 | −0.0168 | 0.6366 |
| ATXN2-A2 | −0.0088 | 0.8051 | ATN1-D | −0.0578 | 0.1029 |
| ATXN2-D | 0.0475 | 0.1803 | ATN1-M | 0.0167 | 0.6385 |
| ATXN2-M | −0.0420 | 0.2367 | KCNN3-A1 | 0.0479 | 0.1763 |
| CACNA1A-A1 | 0.0164 | 0.6441 | KCNN3-A2 | −0.0244 | 0.4910 |
| CACNA1A-A2 | 0.0205 | 0.5628 |
|
|
|
| CACNA1A-D | −0.0130 | 0.7134 | KCNN3-M | 0.0064 | 0.8575 |
| CACNA1A-M | 0.0227 | 0.5220 | RAI1-A1 | 0.0185 | 0.6029 |
| ATXN7-A1 | 0.0001 | 0.9968 | RAI1-A2 | −0.0285 | 0.4219 |
| ATXN7-A2 | −0.0417 | 0.2396 | RAI1-D | −0.0273 | 0.4411 |
| ATXN7-D | −0.0453 | 0.2013 | RAI1-M | 0.0010 | 0.9783 |
| ATXN7-M | −0.0174 | 0.6229 |
Bold indicates features optimized through correlation (feature set 1).
Result of feature optimization by cross-correlation (feature set 2).
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| ATXN3-A2 | −0.7190 | <0.001 | ATXN3-A2*CACNA1A-A2 | −0.3152 | <0.001 |
| ATXN3-D*ATXN3-M | −0.6405 | <0.001 | ATXN3-D*ATXN1-A2 | −0.3099 | <0.001 |
| ATXN3-A2*RAI1-A2 | −0.6003 | <0.001 | ATXN3-M*TBP-A1 | −0.3081 | <0.001 |
| ATXN3-A2*TBP-M | −0.5905 | <0.001 | ATXN3-M*ATXN2-M | −0.3081 | <0.001 |
| ATXN3-A2*TBP-A2 | −0.5578 | <0.001 | ATXN3-D*HTT-A1 | −0.3048 | <0.001 |
| ATXN3-A2*ATXN3-M | −0.5439 | <0.001 | ATXN3-M*ATXN2-A1 | −0.3039 | <0.001 |
| ATXN3-A2*ATXN2-M | −0.5392 | <0.001 | ATXN3-D*HTT-M | −0.3020 | <0.001 |
| ATXN3-A2*TBP-A1 | −0.5362 | <0.001 | ATXN3-M*TBP-A2 | −0.3011 | <0.001 |
| ATXN3-A2*ATXN2-A1 | −0.5250 | <0.001 | ATXN3-M*RAI1-A2 | −0.2995 | <0.001 |
| ATXN3-A2*RAI1-M | −0.5224 | <0.001 | ATXN3-A2*KCNN3-A1 | −0.2931 | <0.001 |
| ATXN3-A2*ATXN3-D | −0.5006 | <0.001 | ATXN3-D*CACNA1A-A2 | −0.2919 | <0.001 |
| ATXN3-A2*ATXN1-M | −0.4763 | <0.001 | ATXN3-D*ATXN7-A2 | −0.2911 | <0.001 |
| ATXN3-A2*ATXN2-A2 | −0.4454 | <0.001 | ATXN3-D*ATXN7-M | −0.2882 | <0.001 |
| ATXN3-A2*KCNN3-A2 | −0.4378 | <0.001 | ATXN3-A2*ATN1-A2 | −0.2849 | <0.001 |
| ATXN3-A2*ATXN1-A2 | −0.4303 | <0.001 | ATXN3-D*ATN1-A2 | −0.2772 | <0.001 |
| ATXN3-A2*KCNN3-M | −0.4238 | <0.001 | ATXN3-M*RAI1-M | −0.2759 | <0.001 |
| ATXN3-A2*ATXN1-A1 | −0.4144 | <0.001 | ATXN3-A2*ATXN7-A1 | −0.2711 | <0.001 |
| ATXN3-A2*RAI1-A1 | −0.4016 | <0.001 | ATXN3-D*KCNN3-A1 | −0.2708 | <0.001 |
| ATXN3-D*TBP-M | −0.3632 | <0.001 | ATXN3-M*ATXN2-A2 | −0.2682 | <0.001 |
| ATXN3-D*TBP-A1 | −0.3629 | <0.001 | ATXN3-D*ATXN7-A1 | −0.2657 | <0.001 |
| ATXN3-D | −0.3554 | <0.001 | ATXN3-D*HTT-A2 | −0.2654 | <0.001 |
| ATXN3-D*RAI1-A2 | −0.3537 | <0.001 | ATXN3-A2*HTT-A2 | −0.2626 | <0.001 |
| ATXN3-D*TBP-A2 | −0.3536 | <0.001 | ATXN3-D*ATN1-M | −0.2565 | <0.001 |
| ATXN3-D*ATXN2-A1 | −0.3524 | <0.001 | ATXN3-D*CACNA1A-M | −0.2547 | <0.001 |
| ATXN3-A2*HTT-M | −0.3475 | <0.001 | ATXN3-M*KCNN3-A2 | −0.2531 | <0.001 |
| ATXN3-A2*HTT-A1 | −0.3447 | <0.001 | ATXN3-A2*ATN1-M | −0.2521 | <0.001 |
| ATXN3-D*ATXN2-M | −0.3432 | <0.001 | ATXN3-M*ATXN1-M | −0.2488 | <0.001 |
| ATXN3-D*RAI1-M | −0.3380 | <0.001 | ATXN3-M*KCNN3-M | −0.2450 | <0.001 |
| ATXN3-D*KCNN3-A2 | −0.3379 | <0.001 | ATXN3-A2*CACNA1A-M | −0.2397 | <0.001 |
| ATXN3-D*ATXN1-M | −0.3266 | <0.001 | ATXN3-M*ATXN1-A1 | −0.2367 | <0.001 |
| ATXN3-D*ATXN1-A1 | −0.3247 | <0.001 | ATXN3-M*RAI1-A1 | −0.2298 | <0.001 |
| ATXN3-A2*ATXN7-M | −0.3213 | <0.001 | ATXN3-M*ATXN1-A2 | −0.2293 | <0.001 |
| ATXN3-D*KCNN3-M | −0.3211 | <0.001 | ATXN3-M*ATXN7-A2 | −0.2262 | <0.001 |
| ATXN3-D*ATXN2-A2 | −0.3182 | <0.001 | ATXN3-M*ATXN7-M | −0.2187 | <0.001 |
| ATXN3-A2*ATXN7-A2 | −0.3181 | <0.001 | ATXN3-A1*ATXN3-D | −0.2133 | <0.001 |
| ATXN3-M | −0.3164 | <0.001 | ATXN3-M*HTT-A1 | −0.2133 | <0.001 |
| ATXN3-D*RAI1-A1 | −0.3162 | <0.001 | ATXN3-M*HTT-M | −0.2093 | <0.001 |
| ATXN3-M*TBP-M | −0.3161 | <0.001 |
Figure 3Feature selection using the SVM-RFE.
Figure 4Feature selection using the StepSVM.
Performances of models constructed with different feature sets and ML methods.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| LR | 1 | 0.599 | 4.936 | 6.546 | 4.011 | 126[63.00] | 27[13.50] |
| 2 | 0.519 | 5.243 | 7.166 | 3.718 | 124[62.00] | 30[15.00] | |
| 3 | 0.619 | 4.814 | 6.380 | 3.505 | 129[64.50] |
| |
| 4 |
|
|
|
|
|
| |
| RR | 1 | 0.597 | 4.972 | 6.564 | 3.985 | 123[61.50] | 27[13.50] |
| 2 | 0.594 | 4.955 | 6.588 | 3.683 | 125[62.50] | 28[14.00] | |
| 3 | 0.604 | 4.908 | 6.509 | 3.818 | 128[64.00] |
| |
| 4 |
|
|
|
|
|
| |
| Lasso | 1 | 0.599 | 4.936 | 6.545 | 4.014 | 126[63.00] | 27[13.50] |
| 2 | 0.606 | 4.902 | 6.491 | 3.738 | 125[62.50] | 28[14.00] | |
| 3 | 0.620 | 4.801 | 6.371 | 3.651 | 127[63.50] |
| |
| 4 |
|
|
|
|
| 25[12.50] | |
| EN | 1 | 0.599 | 4.936 | 6.546 | 4.012 | 126[63.00] | 27[13.50] |
| 2 | 0.597 | 4.935 | 6.565 | 3.591 | 126[63.00] | 28[14.00] | |
| 3 | 0.608 | 4.867 | 6.471 |
| 125[62.50] | 25[12.50] | |
| 4 |
|
|
| 3.639 |
|
| |
| HR | 1 | 0.609 | 4.817 | 6.467 | 3.540 | 127[63.50] | 26[13.00] |
| 2 | 0.605 | 4.826 | 6.497 | 3.548 | 129[64.50] | 29[14.50] | |
| 3 | 0.620 | 4.738 | 6.370 |
| 133[66.50] |
| |
| 4 |
|
|
| 3.454 |
|
| |
| KNN | 1 | 0.548 | 5.474 | 6.949 | 4.422 | 112[56.00] | 31[15.50] |
| 2 | 0.493 | 5.743 | 7.364 | 4.860 | 102[51.00] | 38[19.00] | |
| 3 | 0.574 | 5.191 | 6.749 | 4.403 | 116[58.00] |
| |
| 4 |
|
|
|
|
| 28[14.00] | |
| SVM | 1 | 0.620 |
| 6.375 |
| 128[64.00] | 24[12.00] |
| 2 | 0.594 | 4.975 | 6.585 | 3.655 | 127[63.50] | 25[12.50] | |
| 3 | 0.616 | 4.824 | 6.404 | 3.490 | 131[65.50] |
| |
| 4 |
| 4.766 |
| 3.698 |
|
| |
| RF | 1 | 0.622 | 4.862 | 6.356 | 3.746 | 124[62.00] | 23[11.50] |
| 2 | 0.618 | 4.883 | 6.393 | 3.863 |
| 25[12.50] | |
| 3 | 0.626 | 4.863 | 6.321 | 3.925 | 121[60.50] |
| |
| 4 |
|
|
|
| 125[62.50] | 22[11.00] | |
| XGBoost | 1 | 0.598 | 4.984 | 6.552 | 3.807 | 124[62.00] |
|
| 2 | 0.606 | 4.881 | 6.490 | 3.517 | 130[65.00] | 28[14.00] | |
| 3 | 0.608 | 4.963 | 6.472 | 3.736 | 126[63.00] | 27[13.50] | |
| 4 |
|
|
|
|
| 27[13.50] | |
| ANN | 1 | 0.614 | 4.919 | 6.421 | 3.790 | 126[63.00] | 25[12.50] |
| 2 | 0.624 | 4.729 | 6.338 | 3.582 | 132[66.00] |
| |
| 3 | 0.639 | 4.706 | 6.214 | 3.551 | 133[66.50] | 26[13.00] | |
| 4 |
|
|
|
|
| 22[11.00] |
Models 1, 2, 3, and 4 are constructed with feature sets 1, 2, 3, and 4, respectively.
Bold indicates the best result of the models based on the same ML methods.
Underline indicates the best result of all models.
Performance on different testing subsets.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Testing | LR | L | 6.67 | 7.83 | 6.60 | 38 | 28 |
| 4 |
|
|
|
|
| ||
| RR | L | - | - | - | - | - | |
| 4 |
|
|
|
|
| ||
| Lasso | L | 6.64 | 7.70 | 6.53 | 35 | 24 | |
| 4 |
|
|
|
|
| ||
| EN | L | 6.63 | 7.70 | 6.47 | 35 | 24 | |
| 4 |
|
|
|
|
| ||
| HR | L | - | - | - | - | - | |
| 4 |
|
|
|
|
| ||
| KNN | L | 6.41 | 7.40 | 6.00 |
|
| |
| 4 |
|
|
|
|
| ||
| SVM | L | 7.30 | 8.57 | 6.01 | 34 | 28 | |
| 4 |
|
|
|
|
| ||
| RF | L | 6.74 | 7.67 | 6.05 |
|
| |
| 4 |
|
|
|
| 21 | ||
| XGBoost | L |
|
|
|
|
| |
| 4 | 6.30 | 7.59 | 5.16 | 50 | 25 | ||
| ANN | L | - | - | - | - | - | |
| 4 |
|
|
|
|
| ||
| Testing | LR | L | 4.86 | 6.37 | 3.75 | 60 | 12 |
| 4 |
|
|
|
|
| ||
| RR | L | - | - | - | - | - | |
| 4 |
|
|
|
|
| ||
| Lasso | L | 4.82 | 6.33 | 3.59 | 62 | 12 | |
| 4 |
|
|
|
|
| ||
| EN | L | 4.80 | 6.31 |
| 63 | 24 | |
| 4 |
|
| 3.64 |
|
| ||
| HR | L | - | - | - | - | - | |
| 4 |
|
|
|
|
| ||
| KNN | L | 5.45 | 6.91 | 4.70 | 54 | 15 | |
| 4 |
|
|
|
|
| ||
| SVM | L | 4.96 | 6.47 | 3.70 | 60 | 12 | |
| 4 |
|
|
|
|
| ||
| RF | L | 4.79 | 6.34 | 3.67 |
| 17 | |
| 4 |
|
|
|
|
| ||
| XGBoost | L | 4.78 | 6.31 | 3.59 | 65 |
| |
| 4 |
|
|
|
| 12 | ||
| ANN | L | - | - | - | - | - | |
| 4 |
|
|
|
|
|
Model 4 is constructed with feature set 4.
The results of Model L are those published in the literature.
Bold indicates the best result of models based on the same ML methods.
Underline indicates the best result of all models on the same testing subsets.
Figure 5Importance ranking according to SHAP value.
Figure 6Impact on the model output of all features.
Figure 7Relationship between the SHAP value and ATXN3 CAGexp.
Figure 8Impact on model output of all features in the two parts of the data: (A) ATXN3 CAGexp ≤ 68, (B) ATXN3 CAGexp > 68.
Figure 9Examples of personalized prediction: (A) sample with the largest predicted AAO, (B) sample with the smallest predicted AAO. f (x) is the predicted AAO and E[f (x)] is the expectation of the predicted AAO of all samples.