| Literature DB >> 30089915 |
J Ritari1, K Hyvärinen2, S Koskela2, M Itälä-Remes3, R Niittyvuopio4, A Nihtinen4, U Salmenniemi3, M Putkonen3, L Volin4, T Kwan5, T Pastinen5,6, J Partanen2.
Abstract
Allogeneic haematopoietic stem cell transplantation currently represents the primary potentially curative treatment for cancers of the blood and bone marrow. While relapse occurs in approximately 30% of patients, few risk-modifying genetic variants have been identified. The present study evaluates the predictive potential of patient genetics on relapse risk in a genome-wide manner. We studied 151 graft recipients with HLA-matched sibling donors by sequencing the whole-exome, active immunoregulatory regions, and the full MHC region. To assess the predictive capability and contributions of SNPs and INDELs, we employed machine learning and a feature selection approach in a cross-validation framework to discover the most informative variants while controlling against overfitting. Our results show that germline genetic polymorphisms in patients entail a significant contribution to relapse risk, as judged by the predictive performance of the model (AUC = 0.72 [95% CI: 0.63-0.81]). Furthermore, the top contributing variants were predictive in two independent replication cohorts (n = 258 and n = 125) from the same population. The results can help elucidate relapse mechanisms and suggest novel therapeutic targets. A computational genomic model could provide a step toward individualized prognostic risk assessment, particularly when accompanied by other data modalities.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30089915 PMCID: PMC6326954 DOI: 10.1038/s41375-018-0229-3
Source DB: PubMed Journal: Leukemia ISSN: 0887-6924 Impact factor: 11.528
General characteristics of the discovery patient cohort
| Clinical parameter | Value | |
|---|---|---|
| Recipient age in years, median (range) | 51 (3–70) | |
| Donor age in years, median (range) | 49 (7–72) | |
| Donor-recipient gender, | Male-male | 47 (29) |
| Male-female | 45 (28) | |
| Female-female | 34 (21) | |
| Female-male | 35 (22) | |
| Diagnosis, | Acute myeloid leukemia | 55 (34) |
| Acute lymphoblastic leukemia | 23 (14) | |
| Acute leukemia | 3 (1) | |
| Chronic lymphocytic leukemia | 8 (4) | |
| Chronic myelomonocytic leukemia | 3 (1) | |
| Chronic myeloid leukemia | 3 (1) | |
| Plasma cell leukemia | 1 (1) | |
| T-cell prolymphocytic leukemia | 1 (1) | |
| Non-Hodgkin’s lymphoma | 9 (6) | |
| Hodgkin’s lymphoma | 5 (3) | |
| Follicular lymphoma | 1 (1) | |
| Mantle cell lymphoma | 1 (1) | |
| Diffuse large B-cell lymphoma | 1 (1) | |
| Multiple myeloma | 12 (7) | |
| Myeloma | 10 (6) | |
| Myelodysplastic syndrome | 10 (6) | |
| Myelofibrosis | 4 (2) | |
| Mastocytosis | 1 (1) | |
| Chronic granulomatous disease | 1 (1) | |
| Aplastic anemiaa | 9 (6) | |
| Stem cell source, | Bone marrow | 38 (24) |
| Peripheral blood | 121 (76) | |
| Conditioning regimen, | Myeloablative | 104 (65) |
| Reduced intensity conditioning | 57 (35) | |
| CMV positive | 113 (78) | |
| aGvDH grades III–IV, | 16 (10) | |
| cGvHD, extensive, | 52 (34) | |
| Relapse, | 49 (31) | |
aGvHD acute GVHD, cGvHD chronic GvHD, CMV cytomegalovirus, GvHD graft-versus-host disease
aAnemia diagnoses were omitted from analysis
Fig. 1Schematic representation of the study setup. a Leave-one-out cross-validation (LOOCV) for feature selection and classification model fitting. Each sample is systematically left out in each fold. Prediction error estimates are based on left out samples (blue). b The analysis procedure within each LOOCV fold includes a first round of feature selection with a logistic regression association test followed by fitting a Random forest classification model on variants below an initial association p-value threshold
The top predictive variants and their associated genes
| Chromosome | Positiona | SNP ID | REF | ALT | ALT frequency | ENSEMBL gene ID | Gene symbol |
|---|---|---|---|---|---|---|---|
| 1 | 228929158 | rs4140409 | C | T | 0.675496689 | NA | NA |
| 1 | 228940615 | rs241304 | A | G | 0.619205298 | NA | NA |
| 1 | 230244458 | rs910500 | A | G | 0.440397351 | ENSG00000143641 |
|
| 1 | 230245900 | rs11585739 | T | C | 0.470198675 | ENSG00000143641 |
|
| 1 | 230294715 | rs4846913 | C | A | 0.506622517 | ENSG00000143641 |
|
| 2 | 61070652 | rs1432297 | G | A | 0.516556291 | ENSG00000228414 |
|
| 2 | 61072183 | rs35194171 | T | A | 0.539735099 | ENSG00000228414 |
|
| 2 | 61072567 | rs35741374 | C | T | 0.543046358 | ENSG00000228414 |
|
| 2 | 61075111 | rs1177205 | A | T | 0.456953642 | ENSG00000228414 |
|
| 2 | 61075189 | rs1177206 | C | T | 0.460264901 | ENSG00000228414 |
|
| 2 | 61075209 | rs1177207 | G | A | 0.456953642 | ENSG00000228414 |
|
| 2 | 61075765 | rs750026 | T | C | 0.463576159 | ENSG00000228414 |
|
| 2 | 61075987 | rs750027 | C | G | 0.456953642 | ENSG00000228414 |
|
| 2 | 61080482 | rs842625 | G | A | 0.456953642 | ENSG00000228414 |
|
| 2 | 61085723 | rs842631 | C | T | 0.460264901 | ENSG00000228414 |
|
| 2 | 240674948 | rs11678404 | C | T | 0.271523179 | NA | NA |
| 4 | 68311813 | rs373609666 | T | TACCGCCACCGCC | 0.205298013 | ENSG00000250075 |
|
| 6 | 3424481 | rs9405201 | C | T | 0.32781457 | ENSG00000137266 |
|
| 6 | 3433318 | rs17309827 | T | G | 0.400662252 | ENSG00000137266 |
|
| 6 | 3433713 | rs9392492 | G | GA | 0.301324503 | ENSG00000137266 |
|
| 6 | 37789321 | rs10456096 | G | A | 0.347682119 | ENSG00000156639 |
|
| 8 | 22865320 | rs2430815 | T | G | 0.781456954 | ENSG00000008853 |
|
| 8 | 81278885 | rs12543811 | G | A | 0.586092715 | NA | NA |
| 10 | 64379326 | rs2393904 | C | T | 0.387417219 | ENSG00000138311 |
|
| 11 | 7720426 | rs4367936 | C | A | 0.42384106 | ENSG00000183378 |
|
| 11 | 30438948 | rs492604 | C | T | 0.463576159 | ENSG00000066382 |
|
| 13 | 77589725 | rs599115 | A | C | 0.582781457 | ENSG00000005812 |
|
| 16 | 56368689 | rs1065375 | C | T | 0.5 | ENSG00000087258 |
|
| 19 | 20735272 | rs7251976 | T | C | 0.440397351 | ENSG00000237440 |
|
| 20 | 61342535 | rs35927656 | T | C | 0.374172185 | ENSG00000101188 |
|
| 22 | 26168558 | rs3848858 | A | G | 0.298013245 | ENSG00000133454 |
|
aChromosome position refers to GRCh37
Fig. 2Estimated predictive performance of the model. The results from a the discovery dataset, and b–c the replication datasets. The left-hand side panels show the prediction value distributions over the LOOCV folds for the actual relapsed and non-relapsed groups by the Random forest classification model. The middle panels show the prediction ROC curves and AUC values. In a, the solid black ROC curve indicates the genetic model, the dashed gray curve indicates the model with principal components, and clinical and genetic variables, and the dotted purple curve shows the result using principal components and clinical data only. In b, the dashed green curve and the dotted blue curve show the results for allowing variants with <11 and <81 missing values, respectively. In c, the black curve and the dotted green curve show the results for higher (<0.3) and lower (<0.2) imputed genotype quality filtering stringencies, respectively. The right-hand side panels in a–c show the odds ratio for the correct prediction (y-axis) along the prediction model output values (x-axis). The p-values are calculated with one-sided Mann–Whitney test. The statistical power of the AUC is calculated at alpha level 0.01