| Literature DB >> 21441965 |
S R Marino1, S Lin, M Maiers, M Haagenson, S Spellman, J P Klein, T A Binkowski, S J Lee, K van Besien.
Abstract
The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21441965 PMCID: PMC3128239 DOI: 10.1038/bmt.2011.56
Source DB: PubMed Journal: Bone Marrow Transplant ISSN: 0268-3369 Impact factor: 5.483
Patient characteristics by HLA matching status
| 1 HLA Class I Mismatch DRB1Matched | A, B, C, DRB1 Matched | ||
|---|---|---|---|
| Age at Transplant | |||
| | 29.7 (15.2) | 32.6 (14.2) | <0.001 |
| Sex Donor/Recipient | 0.36 | ||
| Male/Male | 207 (34.5) | 572 (38.0) | |
| Female/Male | 119 (19.8) | 276 (18.3) | |
| Female/Female | 129 (21.5) | 288 (19.1) | |
| Male/Female | 145 (24.2) | 371 (24.6) | |
| Disease | 0.03 | ||
| ALL | 155 (25.8) | 352 (23.4) | |
| AML | 172 (28.7) | 370 (24.6) | |
| CML | 256 (42.7) | 717 (47.6) | |
| MDS | 17 (2.8) | 68 (4.5) | |
| Stage of Disease at Transplant | 0.001 | ||
| Early | 282 (47.0) | 834 (55.3) | |
| Intermediate | 318 (53.0) | 673 (44.7) | |
| Conditioning Regimen | 0.03 | ||
| Myeloablative | 591 (98.5) | 1499 (99.5) | |
| Non-myeloablative | 9 (1.5) | 8 (0.5) | |
| GvHD Prophylaxis | 0.01 | ||
| Tacrolimus ± Other | 121 (20.2) | 298 (19.8) | |
| Cyclosporine A+ | |||
| Methotrexate ± Other | 324 (54.0) | 890 (59.1) | |
| Cyclosporine A ± Other | 13 (2.2) | 57 (3.8) | |
| Methotrexate ± Other | 5 (0.8) | 7 (0.5) | |
| T-Cell Depletion | 137 (22.8) | 254 (16.9) | |
| Other | 0 (0.0) | 1 (0.1) | |
| Stem Cell Source | 0.91 | ||
| Bone Marrow | 559 (93.2) | 1402 (93.0) | |
| PBSC | 41 (6.8) | 105 (7.0) | |
| Year of Transplant | 0.25 | ||
| 1988 – 1992 | 65 (10.8) | 212 (14.1) | |
| 1993 – 1996 | 174 (29.0) | 410 (27.2) | |
| 1997 – 2000 | 241 (40.2) | 597 (39.6) | |
| 2001 – 2004 | 120 (20.0) | 288 (19.1) |
Donor/Recipients with one mismatch at HLA-A: n=179 (29.8%), with one mismatch at HLA-B: n=88 (14.7%), with one mismatch at HLA-C: n=333 (55.5%);
n (%);
no methotrexate;
no cyclosporine A;
peripheral blood stem cells
Distribution of amino acid substitution positions and types
| HLA-A | HLA-B | HLA-C | TOTAL | |
|---|---|---|---|---|
| Number of amino acid positions affected by substitutions | 50 | 44 | 33 | 127 |
| Number of amino acid substitution types | 170 | 104 | 115 | 389 |
Most amino acid substitution positions have multiple substitution types
Amino-acid substitutions and other predictors of day 100 survival obtained by random forest analysis listed in order of importance
| Variable | HLA Molecule Alpha Domain | Importance Score | Other References Reporting Amino Acid Substitutions Associated to HCT Outcomes |
|---|---|---|---|
| Age | — | 100 | |
| Disease stage | — | 50 | |
| HLA-C position 156 | 2 | 36 | |
| HLA-C position 116 | 2 | 35 | |
| HLA-A position 152 | 2 | 31 | |
| HLA-C position 99 | 2 | 24 | |
| HLA-A position 9 | 1 | 21 | |
| HLA-C position 9 | 1 | 20 | |
| HLA-B position 116 | 2 | 20 | |
| Disease type | - | 20 | |
| Gender match | - | 19 | |
| HLA-A position 156 | 2 | 17 | |
| HLA-C position 97 | 2 | 13 | |
| HLA-A position 114 | 2 | 13 | |
| HLA-A position 62 | 1 | 13 | |
| HLA-C position 163 | 2 | 12 | |
| HLA-A position 95 | 2 | 9 | |
| HLA-C position 11 | 1 | 9 | |
| HLA-A position 97 | 2 | 7 | |
| H LA-B position 97 | 2 | 6 | |
| HLA-C position 80 | 1 | 6 | |
| HLA-A position 76 | 1 | 6 | |
| HLA-A position 63 | 1 | 5 | |
| HLA-C position 77 | 1 | 5 | |
| HLA-A position 77 | 1 | 5 | |
| HLA-C position 21 | 1 | 4 | |
| HLA-C position 95 | 2 | 4 | |
| HLA-A position 116 | 2 | 4 | |
| HLA-C position 14 | 1 | 4 | |
| HLA-A position 167 | 2 | 4 | |
| HLA-A position 43 | 1 | 4 | |
| HLA-C position 6 | 1 | 4 | |
| HLA-B positon 109 | 2 | 3 | |
| HLA-C position 173 | 2 | 3 | |
| HLA-C position 66 | 1 | 3 | |
| HLA-A position 166 | 2 | 3 | |
| HLA-B position 156 | 2 | 3 |
The positions with higher importance scores are more critically related to death by day 100 post-HCT and should receive higher priority to be matched.
Figure 1Representative HLA molecules with non-permissive amino acid substitutions identified using random forest analysis
The residues are colored by mismatch groupings. (A) HLA-A, B, and C positions 97, 116, and 156. (B) HLA-A and C positions 9, 77, and 95. (C) HLA-A 43, 62, 63, 76, 114, 152, 166, and 167. (D) HLA-B position 109. (E) HLA-C positions 6, 11, 14, 21, 66, 80, 99, 163, and 173. The mismatches are found on the alpha 1 and alpha 2 domains, with the majority occurring in the peptide binding groove.
Most frequent HLA class I mismatches accounting for amino acid substitutions exhibiting the highest importance scores
| Amino Acid Substitution | Importance Score | HLA Mismatch | Frequency | Percent | Cumulative Percent |
|---|---|---|---|---|---|
| C156 | 36.21 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 05:01/07:04 | 16 | 4.80 | 18.02 | ||
| 14:02/15:02 | 16 | 4.80 | 22.82 | ||
| 03:03/04:01 | 14 | 4.20 | 27.03 | ||
| 07:01/12:03 | 11 | 3.30 | 30.33 | ||
| 06:02/07:01 | 10 | 3.00 | 33.33 | ||
| C116 | 34.75 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 14:02/15:02 | 16 | 4.80 | 18.02 | ||
| 03:03/04:01 | 14 | 4.20 | 22.22 | ||
| A152 | 31.19 | 03:01/03:02 | 12 | 6.70 | 6.70 |
| C99 | 23.59 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 14:02/15:02 | 16 | 4.80 | 18.02 | ||
| 03:03/04:01 | 14 | 4.20 | 22.22 | ||
| A9 | 21.29 | 02:01/02:05 | 14 | 7.82 | 7.82 |
| 02:01/02:06 | 12 | 6.70 | 14.53 | ||
| C9 | 20.39 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 05:01/07:04 | 16 | 4.80 | 18.02 | ||
| 14:02/15:02 | 16 | 4.80 | 22.82 | ||
| 03:03/04:01 | 14 | 4.20 | 27.03 | ||
| 07:01/12:03 | 11 | 3.30 | 30.33 | ||
| B116 | 20.38 | 35:01/35:03 | 17 | 19.32 | 19.32 |
| A156 | 17.44 | 02:01/02:05 | 14 | 7.82 | 7.82 |
| 03:01/03:02 | 12 | 6.70 | 14.53 | ||
| C97 | 13.49 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 14:02/15:02 | 16 | 4.80 | 18.02 | ||
| 07:01/12:03 | 11 | 3.30 | 21.32 | ||
| 06:02/07:01 | 10 | 3.00 | 24.32 | ||
| A114 | 13.07 | 02:01/68:01 | 7 | 3.91 | 3.91 |
| A62 | 13.00 | 02:01/68:01 | 7 | 3.91 | 3.91 |
| C163 | 12.18 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 03:03/04:01 | 14 | 4.20 | 11.71 | ||
| A95 | 9.20 | 02:01/02:05 | 14 | 7.82 | 7.82 |
| C11 | 8.99 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 14:02/15:02 | 16 | 4.80 | 18.02 | ||
| 03:03/04:01 | 14 | 4.20 | 22.22 | ||
| A97 | 6.90 | 02:01/68:01 | 7 | 3.91 | 3.91 |
| B97 | 6.24 | 39:01/39:06 | 4 | 17.39 | 17.39 |
| C80 | 6.07 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 04:01/16:01 | 19 | 5.71 | 13.21 | ||
| 05:01/07:04 | 16 | 4.80 | 18.02 | ||
| 14:02/15:02 | 16 | 4.80 | 22.82 | ||
| 03:03/04:01 | 14 | 4.20 | 27.03 | ||
| 06:02/07:01 | 10 | 3.00 | 30.33 | ||
| A76 | 5.88 | 01:01/11:01 | 7 | 3.91 | 3.91 |
| A63 | 5.09 | 02:01/68:01 | 7 | 3.91 | 3.91 |
| C77 | 4.85 | 01:02/02:02 | 25 | 13.16 | 13.16 |
| 04:01/16:01 | 19 | 10.00 | 23.16 | ||
| 05:01/07:04 | 16 | 8.42 | 31.58 | ||
| 14:02/15:02 | 16 | 8.42 | 40.00 | ||
| 03:03/04:01 | 14 | 7.37 | 47.37 | ||
| 06:02/07:01 | 10 | 5.26 | 52.63 | ||
| A77 | 4.66 | 01:01/11:01 | 7 | 3.91 | 3.91 |
| C21 | 4.33 | 01:02/02:02 | 25 | 7.51 | 7.51 |
| 14:02/15:02 | 16 | 4.80 | 12.31 | ||
| 03:03/04:01 | 14 | 4.20 | 16.52 | ||
| C95 | 4.06 | 05:01/07:04 | 16 | 4.80 | 4.80 |
| 14:02/15:02 | 16 | 4.80 | 9.61 | ||
| 03:03/04:01 | 14 | 4.20 | 13.81 | ||
| A116 | 3.99 | 02:01/68:01 | 7 | 3.91 | 3.91 |
| C14 | 3.89 | 04:01/16:01 | 19 | 31.15 | 31.15 |
| 03:03/04:01 | 14 | 22.95 | 22.95 | ||
| A167 | 3.78 | 01:01/11:01 | 7 | 3.91 | 3.91 |
| 24:02/24:03 | 7 | 3.91 | 7.82 | ||
| A43 | 3.70 | 02:01/02:05 | 14 | 7.82 | 7.82 |
| C6 | 3.58 | 01:02/02:02 | 25 | 48.08 | 48.08 |
| B109 | 3.47 | 35:01/35:02 | 3 | 37.50 | 37.50 |
| 35:02/35:03 | 3 | 37.50 | 75.00 | ||
| C173 | 3.42 | 03:03/04:01 | 14 | 20.90 | 20.90 |
| C66 | 3.40 | 14:02/15:02 | 16 | 18.18 | 18.18 |
| 07:01/12:03 | 11 | 12.50 | 30.68 | ||
| 06:02/07:01 | 10 | 11.36 | 42.05 | ||
| A166 | 3.05 | 01:01/11:01 | 7 | 3.91 | 3.91 |
| 24:02/24:03 | 7 | 3.91 | 7.82 | ||
| B156 | 2.87 | 35:01/35:08 | 7 | 7.95 | 7.95 |
Most common HLA class I mismatches for each locus in relation with the amino acid substitutions with the highest importance scores
| HLA Locus | HLA Mismatch | Cumulative Frequency | Cumulative Percent |
|---|---|---|---|
| HLA-A | 02:01/02:05 | 14 | 7.82 |
| 02:01/02:06 | 26 | 14.53 | |
| 03:01/03:02 | 38 | 21.23 | |
| 01:01/11:01 | 45 | 25.14 | |
| 02:01/68:01 | 52 | 29.05 | |
| 24:02/24:03 | 59 | 32.96 | |
| HLA-B | 35:01/35:03 | 17 | 19.32 |
| 35:01/35:08 | 24 | 27.27 | |
| HLA-C | 01:02/02:02 | 25 | 7.51 |
| 04:01/16:01 | 44 | 13.21 | |
| 05:01/07:04 | 60 | 18.02 | |
| 14:02/15:02 | 76 | 22.82 | |
| 03:03/04:01 | 90 | 27.03 | |
| 07:01/12:03 | 101 | 30.33 | |
| 06:02/07:01 | 111 | 33.33 | |
| 01:02/03:03 | 119 | 35.74 | |
| 01:02/15:02 | 127 | 38.14 | |
| 03:04/07:02 | 135 | 40.54 | |
| 02:02/15:02 | 142 | 42.64 |
Effect of HLA-A, B or C mismatched amino acid substitution type by position on day 100 survival adjusted for patient characteristics using multiple logistic regression
| HLA Locus | Alpha Domain | PPosition | AAmino Acid TType (R/D | n | Death by Day 100 (%Death) | Odds Ratio (95% CI) | |
|---|---|---|---|---|---|---|---|
| A | 2 | 156 | LW | 12 | 58 | 0.001 | 6.01 (1.80-20.07) |
| C | 1 | 9 | FY | 27 | 48 | 0.002 | 3.34 (1.51-7.37) |
| C | 1 | 11 | SA | 69 | 43 | < 0.001 | 2.98 (1.80-4.95) |
| C | 1 | 14 | WR | 37 | 40 | 0.002 | 2.88 (1.45-5.73) |
| C | 1 | 21 | RH | 68 | 38 | 0.001 | 2.33 (1.39-3.91) |
| C | 1 | 49 | EA | 37 | 40 | 0.002 | 2.88 (1.45-5.73) |
| C | 1 | 77 | SN | 86 | 37 | 0.001 | 2.16 (1.36-3.44) |
| C | 1 | 80 | NK | 86 | 37 | 0.001 | 2.16 (1.36-3.44) |
| C | 2 | 97 | WR | 69 | 41 | < 0.001 | 2.56 (1.54-4.26) |
| C | 2 | 99 | CY | 27 | 48 | 0.002 | 3.34 (1.51-7.37) |
| C | 2 | 116 | FS | 36 | 42 | 0.004 | 2.67 (1.34-5.33) |
| C | 2 | 116 | YS | 24 | 46 | 0.004 | 3.14 (1.37-7.20) |
| C | 2 | 156 | RW | 22 | 55 | < 0.001 | 4.26 (1.79-10.11) |
Results are compared to death rate at 100 days post-transplant (21% death) in A, B, C, and DRB1 matched donor-recipient pairs (n=1,507).
R/D= Recipient/donor
Based on score test.
Amino acid substitutions as predictors of death by day 100 identified by multivariate logistic regression analysis
| Number | Odds Ratio | 95% CI | ||
|---|---|---|---|---|
| A17 | ||||
| Matched | 2095 | 1.00 | ||
| Mismatched | 12 | 3.796 | 1.148-12.548 | 0.0288 |
| A73 | ||||
| Matched | 2088 | 1.00 | ||
| Mismatched | 19 | 2.617 | 1.013-6.760 | 0.0470 |
| A166 | ||||
| Matched | 2074 | 1.00 | ||
| Mismatched | 33 | 2.201 | 1.044-4.653 | 0.0381 |
| B116 | ||||
| Matched | 2067 | 1.00 | ||
| Mismatched | 40 | 2.545 | 1.308-4.949 | 0.0059 |
| C116 | ||||
| Matched | 1918 | 1.00 | ||
| Mismatched | 189 | 2.066 | 1.495-2.853 | <.0001 |
| Age | <.0001 | |||
| >50 | 199 | 1.00 | ||
| 40-49 | 529 | 0.947 | 0.658-1.363 | 0.7703 |
| 30-39 | 497 | 0.668 | 0.458-0.976 | 0.0368 |
| 20-29 | 390 | 0.553 | 0.356-0.798 | 0.0022 |
| 10-19 | 277 | 0.553 | 0.359-0.853 | 0.0073 |
| 0-9 | 215 | 0.232 | 0.136-0.397 | <.0001 |
| Disease | 0.0404 | |||
| AML | 542 | 1.00 | ||
| ALL | 507 | 1.279 | 0.947-1.728 | 0.1079 |
| CML | 973 | 0.842 | 0.642-1.105 | 0.2160 |
| MDS | 85 | 1.199 | 0.681-2.112 | 0.5287 |
| Disease Status | <.0001 | |||
| Early | 1116 | 1.00 | ||
| Intermediate | 991 | 1.619 | 1.281-2.047 | <.0001 |
| Sex Match | ||||
| Donor/Recipient | 0.0492 | |||
| Male/Male | 779 | 1.00 | ||
| Female/male | 516 | 1.209 | 0.926-1.578 | 0.1627 |
| Male/Female | 395 | 0.907 | 0.669-1.229 | 0.5298 |
| Female/Female | 417 | 1.364 | 1.030-1.808 | 0.0305 |