| Literature DB >> 35226426 |
Rikke L Nielsen1,2,3, Benjamin O Wolthers2, Marianne Helenius1, Birgitte K Albertsen4, Line Clemmensen5, Kasper Nielsen6, Jukka Kanerva7, Riitta Niinimäki8, Thomas L Frandsen2, Andishe Attarbaschi9, Shlomit Barzilai10, Antonella Colombini11, Gabriele Escherich12, Derya Aytan-Aktug13, Hsi-Che Liu14, Anja Möricke15, Sujith Samarasinghe16, Inge M van der Sluis17, Martin Stanulla18, Morten Tulstrup2, Rachita Yadav6, Ester Zapotocka19, Kjeld Schmiegelow2,20, Ramneek Gupta1.
Abstract
Asparaginase-associated pancreatitis (AAP) frequently affects children treated for acute lymphoblastic leukemia (ALL) causing severe acute and persisting complications. Known risk factors such as asparaginase dosing, older age and single nucleotide polymorphisms (SNPs) have insufficient odds ratios to allow personalized asparaginase therapy. In this study, we explored machine learning strategies for prediction of individual AAP risk. We integrated information on age, sex, and SNPs based on Illumina Omni2.5exome-8 arrays of patients with childhood ALL (N=1564, 244 with AAP 1.0 to 17.9 yo) from 10 international ALL consortia into machine learning models including regression, random forest, AdaBoost and artificial neural networks. A model with only age and sex had area under the receiver operating characteristic curve (ROC-AUC) of 0.62. Inclusion of 6 pancreatitis candidate gene SNPs or 4 validated pancreatitis SNPs boosted ROC-AUC somewhat (0.67) while 30 SNPs, identified through our AAP genome-wide association study cohort, boosted performance (0.80). Most predictive features included rs10273639 (PRSS1-PRSS2), rs10436957 (CTRC), rs13228878 (PRSS1/PRSS2), rs1505495 (GALNTL6), rs4655107 (EPHB2) and age (1 to 7 y). Second AAP following asparaginase re-exposure was predicted with ROC-AUC: 0.65. The machine learning models assist individual-level risk assessment of AAP for future prevention trials, and may legitimize asparaginase re-exposure when AAP risk is predicted to be low.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35226426 PMCID: PMC8946594 DOI: 10.1097/MPH.0000000000002292
Source DB: PubMed Journal: J Pediatr Hematol Oncol ISSN: 1077-4114 Impact factor: 1.289
FIGURE 1Overview of the feature selection and machine learning strategies used in the study. *A future model would benefit from inclusion of the cumulative dosage of pegylated asparaginase (PegAsp). In this study, it was only available on a subset of patients and was thus not fully explored. Age and sex were always included in modeling. AAP indicates asparaginase-associated pancreatitis; SNP, single nucleotide polymorphism.
FIGURE 2Leave-one-out area under the receiver operating characteristic curve (ROC-AUC) feature importance for asparaginase-associated pancreatitis risk models. Models were trained on N=1290 patients using artificial neural networks with 1 hidden layer (A) using age, sex and 6 candidate single nucleotide polymorphisms (SNPs) as features. B, Using age, sex and 4 previously validated SNPs as features. C, Using age, sex and top 30 SNPs associated with asparaginase-associated pancreatitis from Wolthers et al10 genome-wide association study as features.
ROC-AUC Performances Reported as Mean±SD for the Training Data Set (N=1290, 155 AAP Cases), Hold-out Validation Data Set With N=100 Patients With European Ancestry (50 AAP Cases and Controls), Hold-out Validation With N=174 Patients With Non-European Ancestry (39 AAP Cases) and a Subset of the 37 Patients With European Ancestry That Were Re-exposed to Asparaginase (13 AAP Cases) Across 100 Model Initializations in 5-fold Cross-validation
| Data Type | Model | ROC-AUC (N=1290) | ROC-AUC CEU Validation (N=100) | ROC-AUC Non-CEU Validation (N=174) | ROC-AUC Validation 2nd AAP (N=37) |
|---|---|---|---|---|---|
| Six candidate SNPs | ANN (1 hidden layer), binary allele encoding | 0.67±0.01 | 0.64±0.01 | 0.62±0.01 | 0.60±0.01 |
| Validated pancreatitis SNPs | ANN (1 hidden layer), binary allele encoding | 0.67±0.01 | 0.64±0.01 | 0.63±0.01 | 0.57±0.01 |
| Top 30 | ANN (1 hidden layer), binary allele encoding | 0.80±0.01 | 0.84±0.01 | 0.72±0.01 | 0.55±0.04 |
All models are trained with down-sampling on the control group within the cross-validation folds.
AAP indicate asparaginase-associated pancreatitis; ANN, artificial neural network; CEU, Utah Residents (CEPH) with Northern and Western European ancestry; ROC-AUC, area under the receiver operating characteristic curve; SNPs, single nucleotide polymorphisms.
Overview of the Most Predictive AAP Single Nucleotide Polymorphisms
| SNPs | Chromosome | Position | Minor Allele Frequency in Training (N=1290), CEU Validation (N=100), Non-CEU Validation (N=174), Second AAP Validation (N=37) | Model | Odds Ratio |
| Gene |
|---|---|---|---|---|---|---|---|
| rs13228878 | 7 | 142473466 | 0.40, 0.42, 0.44, 0.39 | 6 candidate SNPs | 0.6261 | 1.275e−05 |
|
| Previously validated SNPs | NA | 0.03 | |||||
| rs10436957 | 1 | 15768304 | 0.23, 0.22, 0.19, 0.16 | 6 candidate SNPs | 0.6643 | 0.00199 |
|
| rs10273639 | 7 | 142456928 | 0.41, 0.40, 0.45, 0.36 | Previously validatedSNPs | 1.4 | 2.0e−14 |
|
| rs1505495 | 4 | 172973580 | 0.16, 0.13, 0.17, 0.11 | Top 30 PTWG | 0.4974 | 1.856e−05 |
|
| rs4655107 | 1 | 23094454 | 0.24, 0.22, 0.11, 0.18 | Top 30 PTWG | 0.5573 | 3.972e−05 |
|
Odds ratio and P-value is obtained from the PTWG AAP GWAS by Wolthers et al.10
Odds ratio and P-value reported for validated variant from the PTWG AAP GWAS 2019 by Wolthers et al.10
Odds ratio and P-value reported as in Table S.3 (Supplemental Digital Content 1, http://links.lww.com/JPHO/A474), Rosendahl et al.19
AAP indicates asparaginase-associated pancreatitis; CEU, Utah Residents (CEPH) with Northern and Western European ancestry; GWAS, genome-wide association study; NA, not applicable; PTWG, Ponte di Legno toxicity working group; SNPs, single nucleotide polymorphisms.
FIGURE 3Personalized artificial intelligence ensemble models based on mean of scores, majority voting and mean of confident scores (t=0.7). A, ROC curve for the ensemble when predicting on the training data set (N=1290). B and C, Plot of prediction scores vs true class and table of performance metrics for different score thresholds when scoring the predictions on the training data set (N=1290) model ensemble with the mean of confident scores (score threshold of ≤0.30 or ≥0.70). D, ROC curve for the ensemble when predicting on the European hold-out data set (N=100). E and F, Plot of prediction scores versus true class and table of performance metrics for different score thresholds when scoring the predictions on the European hold-out data set (N=100) model ensemble with the mean of confident scores (score threshold of ≤0.30 or ≥0.70). G, ROC curve for the ensemble when predicting on the non-European hold-out data set (N=174). H and I, Plot of prediction scores versus true class and table of performance metrics for different score thresholds when scoring the predictions on the non-European hold-out data set (N=174) model ensemble with the mean of confident scores (score threshold of ≤0.30 or ≥0.70). J, ROC curve for the ensemble when predicting secondary AAP cases. K and L, Plot of prediction scores versus true class and table of performance metrics for different score thresholds when scoring the predictions on the second AAP phenotype (N=37) model ensemble with the mean of confident scores (score threshold of ≤0.30 or ≥0.70). AAP indicates asparaginase-associated pancreatitis; AUC, area under the curve; NPV, negative predictive value; PPV, positive predictive value; ROC, receiver operating characteristic; Score, applied prediction score threshold for classification (≥score).
FIGURE 4Leave-one-out area under the receiver operating characteristic curve (ROC-AUC) feature importance for asparaginase-associated pancreatitis re-exposure model using a logistic regression trained to predict second cases of asparaginase-associated pancreatitis when re-exposed to asparaginase (N=37, 13 cases). The model used age, sex, and previously validated single nucleotide polymorphisms trained with leave-one-out cross-validation.