Literature DB >> 21441965

Identification by random forest method of HLA class I amino acid substitutions associated with lower survival at day 100 in unrelated donor hematopoietic cell transplantation.

S R Marino1, S Lin, M Maiers, M Haagenson, S Spellman, J P Klein, T A Binkowski, S J Lee, K van Besien.   

Abstract

The identification of important amino acid substitutions associated with low survival in hematopoietic cell transplantation (HCT) is hampered by the large number of observed substitutions compared with the small number of patients available for analysis. Random forest analysis is designed to address these limitations. We studied 2107 HCT recipients with good or intermediate risk hematological malignancies to identify HLA class I amino acid substitutions associated with reduced survival at day 100 post transplant. Random forest analysis and traditional univariate and multivariate analyses were used. Random forest analysis identified amino acid substitutions in 33 positions that were associated with reduced 100 day survival, including HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166 and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163 and 173. In all 13 had been previously reported by other investigators using classical biostatistical approaches. Using the same data set, traditional multivariate logistic regression identified only five amino acid substitutions associated with lower day 100 survival. Random forest analysis is a novel statistical methodology for analysis of HLA mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21441965      PMCID: PMC3128239          DOI: 10.1038/bmt.2011.56

Source DB:  PubMed          Journal:  Bone Marrow Transplant        ISSN: 0268-3369            Impact factor:   5.483


Introduction

Unrelated donor hematopoietic cell transplantation (HCT) is an established treatment option for patients with hematological malignancies who lack a human leukocyte antigen (HLA) identical sibling. Approximately 70% of unrelated donor transplants in 2009 facilitated by the U.S. National Marrow Donor Program (NMDP) used donors who were HLA-matched with the recipient; the other 30% had at least one HLA-mismatch. HLA mismatches are a major barrier to successful long-term outcome in HCT; even a single antigen or allele mismatch has a significant effect on graft survival and particularly on incidence and severity of graft-versus-host disease (GvHD) [1-5]. Although the molecular basis of allorecognition in GvHD and cellular graft rejection is not completely understood [6,7], isolated reports have shown that a single amino acid substitution between mismatched HLA alleles at a critical location can play an important role in acute GvHD [8] and graft rejection [9]. However, long-term survival after HCT is likely influenced not by a single mismatch but by multiple interacting mismatches as well as by patient and donor clinical characteristics and biological factors. Mismatched antigens and alleles differ in the number, type and location of mismatched amino acids on the structure of the HLA molecule. Some substitutions may alter the peptide binding capability of the HLA molecule, while others may be irrelevant. It is likely that substitutions on the HLA molecules with altered peptide binding capacity that affect T-cell allorecognition underlie the varying clinical severity of GvHD and transplant outcomes associated with HLA-mismatched transplantation. Studies focused on identification of amino acid substitutions associated with adverse outcomes are scarce [10,11] and in conflict with functional studies [12,13]. Furthermore, these studies used traditional statistical techniques which have a limited ability to simultaneously analyze the effect of a large number of unordered categorical risk factors, side-chain variability at each amino acid position, and their potential interactions. The purpose of this study was to identify HLA amino acid substitutions that are associated with lower survival at day 100 post-transplant (D100S) using a novel statistical methodology referred to as random forest analysis [14,15]. Random forest analysis is a computationally intensive method that uses a recursive partitioning algorithm to build individual prediction trees from randomly sampled subsets of data. It automatically accounts for interactions among a large number of potential predictors of HCT outcome [16]. Although random forest analysis has not been used to analyze HLA data in unrelated transplantation before, this type of analysis has been shown to be extremely powerful and robust in the analysis of datasets with a “large p and small n”, datasets where the number of predictor variables (p) is large, but the number of cases (n) is relatively small. In comparative analysis of discrimination methods for gene array expression data, it has consistently been shown to be superior or at least equivalent to other methods [17-19].

Methods

Patients

The study was based on a data set of 3,855 patient-donor pairs facilitated by the NMDP between 1988 and 2003. All surviving recipients included in this data set were retrospectively contacted and provided informed consent for participation in the NMDP research program. Approximately 4% of surviving patients would not provide consent for research. To adjust for the potential bias introduced by exclusion of non-consenting surviving patients, a sampling process randomly excluded appropriately the same percentage of deceased patients using a biased coin randomization with exclusion probabilities based on characteristics associated with not providing consent for use of the data in survivors [2]. The final study population consisted of 2,107 patients with good or intermediate risk hematologic malignancies who underwent allogeneic transplantation from HLA-matched or single HLA class I allele or antigen mismatched unrelated donors. Good risk was defined as acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) in first complete remission, chronic myeloid leukemia (CML) in first chronic phase, and myelodysplastic syndrome (MDS) subtype refractory anemia. Intermediate risk was defined as AML and ALL in second or subsequent complete remission or in first relapse, and CML in accelerated phase or second chronic phase. Patients with high risk disease were excluded from the analysis in order to better examine the relationship between amino acid substitutions and survival. High-resolution HLA typing was performed for HLA-A, B, C, DRB1, DQA1, DQB1, DPA1, and DPB1 on all donor-recipient pairs as previously described [2]. However, in this study only HLA-A, B, C, and DRB1 were considered in the definition of HLA-matching based on the results of the Lee et al study [2]. To avoid confounding effects of HLA mismatches in the graft-versus-host and host-versus-graft directions, donors and recipients that were homozygous at an HLA class I locus (n=91) were excluded from analysis. Donor-recipient pairs with more than one mismatch in HLA-A, B, C and DRB1 or those mismatched at HLA-DRB1 were also excluded. There were 1,507 donor-recipient pairs who were matched at HLA-A, B, C, and DRB1 (referred to as the matched group) and 600 donor-recipient pairs with only one allele or antigen mismatch at HLA-A, B or C (referred to as the mismatched group). The frequency distribution of the 600 mismatched donor-recipient pairs at HLA-A, B, and C is 179 (29.8%), 88 (14.7%), and 333 (55.5%), respectively.

Data sources

The Center for International Blood and Marrow Transplant Research (CIBMTR) is a research affiliation of the International Bone Marrow Transplant Registry (IBMTR), Autologous Blood and Marrow Transplant Registry (ABMTR) and the NMDP established in 2004 that comprises a voluntary working group of more than 450 transplantation centers worldwide that contribute detailed data on consecutive allogeneic and autologous hematopoietic HCT to a Statistical Center at the Medical College of Wisconsin in Milwaukee and the NMDP Coordinating Center in Minneapolis. Participating centers are required to report all transplants consecutively; compliance is monitored by on-site audits. Patients are followed longitudinally, with yearly follow-up. Computerized checks for discrepancies, physicians' review of submitted data and on-site audits of participating centers ensure data quality. Observational studies conducted by the CIBMTR are performed in compliance with the Privacy Rule (HIPAA) as a Public Health Authority, and in compliance with all applicable federal regulations pertaining to the protection of human research participants as determined by continuous review of the Institutional Review Boards of the NMDP and the Medical College of Wisconsin since 1985.

Amino acid substitution assignment

Amino acid substitutions were assigned by comparing the amino acid sequences of the mismatched alleles carried by the donor and the recipient using the International Immunogenetics Project, IMGT/HLA database, http://www.ebi.ac.uk/imgt/hla, accessed on July 2007. Polymorphic amino acid positions were identified by position number and type. The observed mismatches between patient and donor were recorded by position number and the two different amino acids. The majority (∼80%) of the HLA alleles in the IMGT HLA database are defined based on partial sequence where a portion of the exonic nucleotides are not described. For this study we restricted the analysis to exons 2-3 for class I alleles and exon 2 for class II alleles where the majority of the alleles are fully characterized. To address the few instances where the reference sequence definition is incomplete within these exons we used a simple imputation method to fill-in the sequence with that of the most similar fully characterized allele. The similarity measure used was hamming distance or the minimum number of nucleotide differences.

Statistical analyses

Random Forest analysis

Random forest analysis was used to identify amino acid substitutions associated with the primary endpoint of survival to day 100, accounting for clinical and transplant characteristics and other simultaneous amino acid substitutions present. Because random forest analysis has not been used before in HCT studies, we provide a brief description of the method and its functional properties. Random forest is a tree-based method for classification developed by Leo Breiman [14] that uses an ensemble of classification or decision trees. Using a recursive partitioning algorithm each classification tree is built based on a bootstrap sample of the training data. Some records will be included more than once in the sample, and others will not appear at all. Generally, about two thirds of the records will be included in each bootstrap sample of the training dataset, and one third will be left out. The left out records are used to provide an ongoing dynamic assessment of model performance, similar to repeated cross-validation. In addition, a random subset of the available predictor variables is used to determine the best partition of the data at each node of each individual tree building process. This doubly random process produces a collection of substantially different trees. Together, the resulting decision trees form the forest that represents the final ensemble tree model where each decision tree votes for the result and the majority wins. In contrast to traditional multivariate modeling, the random forest analysis can account for inter-relationships among all potential predictors including highly multilevel unordered categorical covariates in building a tree-based predictive model. Unlike traditional univariate and multivariate logistic regression analysis, random forest analysis has the capability to analyze large training datasets with hundreds or even thousands of input variables. The two-part randomness (random subset of patients, random subset of variables) employed by the random forest method has been shown to deliver considerable robustness to noise, outliers, and over-fitting, when compared to a single tree classifier. Random forest analysis was carried out using the random forest software, version 1.0 (Salford Systems, San Diego, CA). Four patient-donor clinical characteristics (age, disease type, disease status, donor-recipient gender match) identified as associated with day 100 survival in preliminary analyses and 127 amino acid substitution position variables at HLA-A, B, or C constituted the set of eligible predictors in the random forest analysis. We built a random forest model based on a collection of 500 classification trees with each individual tree built from a bootstrap sample of the original 2,107 donor-patient pairs. At each tree node (except the terminal nodes) of growing a tree a set of 15 predictors randomly selected from the total 131 predictors was used to determine the best split of the node. Results for each potential variable are expressed as a 0-100 ranking of variable importance, with higher scores indicating greater predictive ability. In contrast to traditional univariate and multivariate modeling, confidence intervals and p values are not available.

Traditional Univariate and Multivariate analysis

Traditional univariate and multivariate analyses were performed in order to compare the results obtained by the random forest analysis with those obtained from a more common statistical approach using the same data set. For the univariate approach, each mismatched type by position subgroup was compared to the HLA-matched group using a binary indicator variable in multiple logistic regression model with adjustment for patient risk factors. Because of multiple testing, indicator variables with a more stringent p value of 0.005 or less were considered as statistically significant, indicating that the death rate by day 100 of the specific mismatched type by position subgroup is different from that of the matched group. For the traditional multivariate logistic regression model, the potential differential effects of substitution type were ignored and the model tested the effect of any amino acid substitution within each position (mismatch versus match regardless of type). An initial screening was conducted by testing the effect of each amino acid substitution position separately at 5% significance level in a logistic regression model with adjustment for the significant patient risk factors (age, disease type, disease stage, and donor-recipient gender match). Then, based on the amino acid substitution position variables that were significant in the initial screening a final model was built using a forward stepwise regression procedure with a 5% significance level as the variable entry or deletion criterion. This final model allowed for an identification of interactive effect among multiple amino acid substitution positions but could not evaluate types of substitutions or their interactions because the model cannot accommodate the large number of indicator variables necessary to code all possible substitution types and their interactions among combinations of substitution positions.

Results

Patient characteristics

Patient characteristics are summarized in Table 1 for the HLA-mismatched and matched groups respectively. There were significant differences between the groups with respect to age, disease type, disease stage, conditioning regimen, and GvHD prophylaxis at the 5% significance level. However, after Bonferroni adjustment for multiple comparisons to reduce the possibility of false positive results only age and disease stage remained significant at the 5% level. The day 100 survival was 79% for the HLA-matched group and 69% for the HLA-mismatched group, p<0.001.
Table 1

Patient characteristics by HLA matching status

1 HLA Class I Mismatch DRB1Matched(n=6001)A, B, C, DRB1 Matched(n=1,507)p-Values
Age at Transplant
Mean (SD)29.7 (15.2)32.6 (14.2)<0.001
Sex Donor/Recipient20.36
 Male/Male207 (34.5)572 (38.0)
 Female/Male119 (19.8)276 (18.3)
 Female/Female129 (21.5)288 (19.1)
 Male/Female145 (24.2)371 (24.6)
Disease20.03
 ALL155 (25.8)352 (23.4)
 AML172 (28.7)370 (24.6)
 CML256 (42.7)717 (47.6)
 MDS17 (2.8)68 (4.5)
Stage of Disease at Transplant20.001
 Early282 (47.0)834 (55.3)
 Intermediate318 (53.0)673 (44.7)
Conditioning Regimen20.03
 Myeloablative591 (98.5)1499 (99.5)
 Non-myeloablative9 (1.5)8 (0.5)
GvHD Prophylaxis20.01
 Tacrolimus ± Other121 (20.2)298 (19.8)
Cyclosporine A+
 Methotrexate ± Other324 (54.0)890 (59.1)
Cyclosporine A ± Other313 (2.2)57 (3.8)
Methotrexate ± Other45 (0.8)7 (0.5)
 T-Cell Depletion137 (22.8)254 (16.9)
 Other0 (0.0)1 (0.1)
Stem Cell Source20.91
 Bone Marrow559 (93.2)1402 (93.0)
 PBSC541 (6.8)105 (7.0)
Year of Transplant20.25
 1988 – 199265 (10.8)212 (14.1)
 1993 – 1996174 (29.0)410 (27.2)
 1997 – 2000241 (40.2)597 (39.6)
 2001 – 2004120 (20.0)288 (19.1)

Donor/Recipients with one mismatch at HLA-A: n=179 (29.8%), with one mismatch at HLA-B: n=88 (14.7%), with one mismatch at HLA-C: n=333 (55.5%);

n (%);

no methotrexate;

no cyclosporine A;

peripheral blood stem cells

Distribution of amino acid substitutions positions and types

From the 600 donor-recipient pairs that had one HLA-A, B, or C amino acid mismatch and were DRB1 matched, 371 had antigen mismatches and 229 had allele mismatches as defined by the NMDP [2]. HLA-A, B, and C sequences each had up to a total length of 181 amino acids. Amino acid substitutions were identified in 50 positions in HLA-A, 44 positions in HLA-B, and 33 positions in HLA-C, for a total of 127 mismatched amino acid positions. Most mismatched positions have multiple mismatch types, hence a total of 389 amino acid substitutions were identified for the 127 positions (an average of 3.1 types per amino acid substitution position), Table 2.
Table 2

Distribution of amino acid substitution positions and types

HLA-AHLA-BHLA-CTOTAL
Number of amino acid positions affected by substitutions504433127
Number of amino acid substitution types1170104115389

Most amino acid substitution positions have multiple substitution types

Amino-acid substitutions identified by the random forest analysis

Four patient variables (age, disease stage, disease type, gender match) and 33 amino-acid substitutions out of 127 amino acid substitutions were assigned an importance score of 2.9 or higher (in a scale of 0 to 100) by random forest analysis and identified as predictors of death at day 100 post-transplant, Table 3. A cut-off value of 2.9 for the importance score on a scale of 0 to 100, was established to include the most important overlapping amino acid substitutions across the different HLA class I loci. The criteria used for selection of the most important positions was to include all 13 previously identified amino-acid substitutions as well as any new position (n=20) with an importance score higher than a previously identified position. Amino acid substitutions using this definition were: HLA-A 9, 43, 62, 63, 76, 77, 95, 97, 114, 116, 152, 156, 166, and 167; HLA-B 97, 109, 116 and 156; and HLA-C 6, 9, 11, 14, 21, 66, 77, 80, 95, 97, 99, 116, 156, 163, and 173, Figure 1. Table 3 shows a ranking of these amino acid substitutions by the strength of the importance score received on random forest analysis, and also summarizes previous reports in the literature.
Table 3

Amino-acid substitutions and other predictors of day 100 survival obtained by random forest analysis listed in order of importance

VariableHLA Molecule Alpha DomainImportance ScoreOther References Reporting Amino Acid Substitutions Associated to HCT Outcomes
Age100
Disease stage50
HLA-C position 156236Kawase, 200925Kawase, 200711
HLA-C position 116235Kawase, 2007Ferrara, 200110
HLA-A position 152231Ferrara, 2001
HLA-C position 99224Kawase, 2009Kawase, 2007
HLA-A position 9121Kawase, 2009Kawase, 2007
HLA-C position 9120Kawase, 2007
HLA-B position 116220Ferrara, 2001
Disease type-20
Gender match-19
HLA-A position 156217Ferrara, 2001
HLA-C position 97213
HLA-A position 114213Ferrara, 2001
HLA-A position 62113
HLA-C position 163212
HLA-A position 9529
HLA-C position 1119
HLA-A position 9727
H LA-B position 9726
HLA-C position 8016Kawase, 2007
HLA-A position 7616
HLA-A position 6315
HLA-C position 7715Kawase, 2007
HLA-A position 7715
HLA-C position 2114
HLA-C position 9524
HLA-A position 11624Kawase, 2007Ferrara, 2001
HLA-C position 1414
HLA-A position 16724
HLA-A position 4314
HLA-C position 614
HLA-B positon 10923
HLA-C position 17323
HLA-C position 6613
HLA-A position 16623
HLA-B position 15623Ferrara, 2001Burrows, 199424Keever, 19948Fleischhauer, 19909

The positions with higher importance scores are more critically related to death by day 100 post-HCT and should receive higher priority to be matched.

Figure 1

Representative HLA molecules with non-permissive amino acid substitutions identified using random forest analysis

The residues are colored by mismatch groupings. (A) HLA-A, B, and C positions 97, 116, and 156. (B) HLA-A and C positions 9, 77, and 95. (C) HLA-A 43, 62, 63, 76, 114, 152, 166, and 167. (D) HLA-B position 109. (E) HLA-C positions 6, 11, 14, 21, 66, 80, 99, 163, and 173. The mismatches are found on the alpha 1 and alpha 2 domains, with the majority occurring in the peptide binding groove.

Most frequent HLA class I mismatches accounting for amino acid substitutions exhibiting the highest importance scores

The most frequent HLA class I mismatches for each of the 33 amino acid substitutions identified by random forest with high importance scores and their frequencies are listed in Table 4. Table 5 shows the most common HLA class I mismatches for each locus that correspond to the amino acid substitutions with high importance scores. The most common HLA mismatches in relation with these amino acids for each class I locus are HLA-A*02:01/02:05, HLA-B*35:01/35:03, and HLA-C*01:02/02:02, Table 5. The percentages were calculated based on all mismatches at a particular locus as the denominator. Only HLA mismatches with a frequency of 10 or higher were included. However, if no HLA mismatches with a frequency of 10 or higher were available, the highest available frequency was included in the table.
Table 4

Most frequent HLA class I mismatches accounting for amino acid substitutions exhibiting the highest importance scores

Amino Acid SubstitutionImportance ScoreHLA MismatchFrequencyPercentCumulative Percent
C15636.2101:02/02:02257.517.51
04:01/16:01195.7113.21
05:01/07:04164.8018.02
14:02/15:02164.8022.82
03:03/04:01144.2027.03
07:01/12:03113.3030.33
06:02/07:01103.0033.33
C11634.7501:02/02:02257.517.51
04:01/16:01195.7113.21
14:02/15:02164.8018.02
03:03/04:01144.2022.22
A15231.1903:01/03:02126.706.70
C9923.5901:02/02:02257.517.51
04:01/16:01195.7113.21
14:02/15:02164.8018.02
03:03/04:01144.2022.22
A921.2902:01/02:05147.827.82
02:01/02:06126.7014.53
C920.3901:02/02:02257.517.51
04:01/16:01195.7113.21
05:01/07:04164.8018.02
14:02/15:02164.8022.82
03:03/04:01144.2027.03
07:01/12:03113.3030.33
B11620.3835:01/35:031719.3219.32
A15617.4402:01/02:05147.827.82
03:01/03:02126.7014.53
C9713.4901:02/02:02257.517.51
04:01/16:01195.7113.21
14:02/15:02164.8018.02
07:01/12:03113.3021.32
06:02/07:01103.0024.32
A11413.0702:01/68:0173.913.91
A6213.0002:01/68:0173.913.91
C16312.1801:02/02:02257.517.51
03:03/04:01144.2011.71
A959.2002:01/02:05147.827.82
C118.9901:02/02:02257.517.51
04:01/16:01195.7113.21
14:02/15:02164.8018.02
03:03/04:01144.2022.22
A976.9002:01/68:0173.913.91
B976.2439:01/39:06417.3917.39
C806.0701:02/02:02257.517.51
04:01/16:01195.7113.21
05:01/07:04164.8018.02
14:02/15:02164.8022.82
03:03/04:01144.2027.03
06:02/07:01103.0030.33
A765.8801:01/11:0173.913.91
A635.0902:01/68:0173.913.91
C774.8501:02/02:022513.1613.16
04:01/16:011910.0023.16
05:01/07:04168.4231.58
14:02/15:02168.4240.00
03:03/04:01147.3747.37
06:02/07:01105.2652.63
A774.6601:01/11:0173.913.91
C214.3301:02/02:02257.517.51
14:02/15:02164.8012.31
03:03/04:01144.2016.52
C954.0605:01/07:04164.804.80
14:02/15:02164.809.61
03:03/04:01144.2013.81
A1163.9902:01/68:0173.913.91
C143.8904:01/16:011931.1531.15
03:03/04:011422.9522.95
A1673.7801:01/11:0173.913.91
24:02/24:0373.917.82
A433.7002:01/02:05147.827.82
C63.5801:02/02:022548.0848.08
B1093.4735:01/35:02337.5037.50
35:02/35:03337.5075.00
C1733.4203:03/04:011420.9020.90
C663.4014:02/15:021618.1818.18
07:01/12:031112.5030.68
06:02/07:011011.3642.05
A1663.0501:01/11:0173.913.91
24:02/24:0373.917.82
B1562.8735:01/35:0877.957.95
Table 5

Most common HLA class I mismatches for each locus in relation with the amino acid substitutions with the highest importance scores

HLA LocusHLA MismatchCumulative FrequencyCumulative Percent
HLA-A02:01/02:05147.82
02:01/02:062614.53
03:01/03:023821.23
01:01/11:014525.14
02:01/68:015229.05
24:02/24:035932.96
 HLA-B35:01/35:031719.32
35:01/35:082427.27
 HLA-C01:02/02:02257.51
04:01/16:014413.21
05:01/07:046018.02
14:02/15:027622.82
03:03/04:019027.03
07:01/12:0310130.33
06:02/07:0111133.33
01:02/03:0311935.74
01:02/15:0212738.14
03:04/07:0213540.54
02:02/15:0214242.64

Traditional univariate analysis of amino acid substitutions adjusting for clinical variables

Table 6 lists all 13 amino acid substitution subgroups with greater than 10 patients and with significantly greater death rates by day 100 (p<0.005 in two-sided test) as compared with the HLA-matched group (1,507 donor-recipient pairs) in univariate analysis adjusting for clinical variables. For the HLA-A mismatched group, only 1 amino-acid substitution position and type, 156-L:W (recipient: donor), was identified. No amino-acid substitutions associated with worse outcome were identified for the HLA-B mismatched group. This may be due in part to the fact that there are only 88 (14.7%) HLA-mismatched donor-recipient pairs with HLA-B mismatches. Twelve amino-acid substitutions were identified in the HLA-C mismatched group. A total of 7 different amino-acid substitutions are on the alpha 1 domain, in 7 different positions and 5 amino-acid substitutions are located on the alpha 2 domain, in 4 different positions.
Table 6

Effect of HLA-A, B or C mismatched amino acid substitution type by position on day 100 survival adjusted for patient characteristics using multiple logistic regression

HLA LocusAlpha DomainPPositionAAmino Acid TType (R/D1)nDeath by Day 100 (%Death)p Value2Odds Ratio (95% CI)
A2156LW12580.0016.01 (1.80-20.07)
C19FY27480.0023.34 (1.51-7.37)
C111SA6943< 0.0012.98 (1.80-4.95)
C114WR37400.0022.88 (1.45-5.73)
C121RH68380.0012.33 (1.39-3.91)
C149EA37400.0022.88 (1.45-5.73)
C177SN86370.0012.16 (1.36-3.44)
C180NK86370.0012.16 (1.36-3.44)
C297WR6941< 0.0012.56 (1.54-4.26)
C299CY27480.0023.34 (1.51-7.37)
C2116FS36420.0042.67 (1.34-5.33)
C2116YS24460.0043.14 (1.37-7.20)
C2156RW2255< 0.0014.26 (1.79-10.11)

Results are compared to death rate at 100 days post-transplant (21% death) in A, B, C, and DRB1 matched donor-recipient pairs (n=1,507).

R/D= Recipient/donor

Based on score test.

Traditional multivariate analysis of amino-acid substitution positions adjusting for clinical variables

We first tested if a single amino acid substitution position (regardless of substitution type) was associated with death by day 100 after adjustment for important patient risk factors. Using a 5% significance level we identified the following substitution positions: HLA-A 9, 17, HLA-B 109 and 116, HLA-C 6, 9, 11, 14, 16, 21, 24, 49, 77, 80, 97, 99, 114, 116, 156, 163. With a more stringent 0.5% significance level only the following 10 HLA-C positions: 9, 11, 21, 77, 80, 97, 99, 116, 156, and 163 were identified. Of these 10 HLA-C positions, 9 positions (except 163) were already identified by univariate analysis that tested the effect of substitution type at each substitution position, Table 6. It can be seen that multivariate analysis identified 4 additional substitution positions at the 0.5% significance level. This indicates that in addition to identifying more informative substitution type effect, testing the differential effect of substitution type at each substitution position is also a more powerful approach to identify substitution positions. Holding patient risk factors in the model we used a forward stepwise procedure with a 5% significance level for entry into and removal from the model to select the most important amino acid substitution positions from the initially identified positions. We found that HLA-A positions 17, 73, 166, HLA-B position 116, and HLA-C position 116 were the only amino acid substitution positions simultaneously associated with outcome, Table 7.
Table 7

Amino acid substitutions as predictors of death by day 100 identified by multivariate logistic regression analysis

NumberOdds Ratio95% CIp Value
A17
Matched20951.00
Mismatched123.7961.148-12.5480.0288
A73
Matched20881.00
Mismatched192.6171.013-6.7600.0470
A166
Matched20741.00
Mismatched332.2011.044-4.6530.0381
B116
Matched20671.00
Mismatched402.5451.308-4.9490.0059
C116
Matched19181.00
Mismatched1892.0661.495-2.853<.0001
Age<.0001
>501991.00
40-495290.9470.658-1.3630.7703
30-394970.6680.458-0.9760.0368
20-293900.5530.356-0.7980.0022
10-192770.5530.359-0.8530.0073
0-92150.2320.136-0.397<.0001
Disease0.0404
AML5421.00
ALL5071.2790.947-1.7280.1079
CML9730.8420.642-1.1050.2160
MDS851.1990.681-2.1120.5287
Disease Status<.0001
Early11161.00
Intermediate9911.6191.281-2.047<.0001
Sex Match
Donor/Recipient0.0492
Male/Male7791.00
Female/male5161.2090.926-1.5780.1627
Male/Female3950.9070.669-1.2290.5298
Female/Female4171.3641.030-1.8080.0305
HLA-DQ and DP matching status was also analyzed. DQ matching status was not associated with survival rate at day 100 (p=0.33) but DP matching status was (p=0.005). These results indicate that there is no linkage effect of the class I mismatches with DQA1 or DQB1 disparities. There was no survival difference between patient-donor pairs that had one HLA class I antigen or allele mismatch (p=0.66).

Discussion

Several large studies using standard multivariable modeling have established the importance of molecular matching at HLA-A, B, C, and DRB1 for the outcome of HCT [1-5]. It is estimated that on average, every additional mismatch is associated with a 10% decrement in survival after adult unrelated donor transplantation for good risk patients [2]. But it is equally clear that many patients, particularly minorities lack matched unrelated donors [20] and suitable mismatched donors need to be identified to offer transplants to these patients. The effect of HLA mismatching on GvHD, relapse, and transplant related mortality (TRM) is mediated by amino acid substitutions, several of which can be found in most mismatched alleles. In this study we have identified 33 amino acid substitutions' locations that are associated with survival at day 100 post-transplant. Some of these locations, 97, 116 and 156, were present in all three HLA class I loci. Substitution locations 9, 77, and 95 were present on HLA-A and HLA-C mismatched antigens or alleles. Some locations were only identified on mismatched antigens or alleles of a single locus; HLA-A 43, 62, 63, 76, 114, 152, 166, 167; HLA-B 109; and HLA-C, 6, 11, 14, 21, 66, 80, 99, 163, and 173. The majority of the important amino acid substitutions identified in this study as associated with survival to day 100 are located on the alpha 1 or the alpha 2 domains of the peptide binding site, encoded by exons 2 and 3 respectively and are predicted to directly affect T-cell allorecognition [21-23]. The most common HLA mismatches associated with these amino acids are HLA-A*02:01/02:05, 02:01/02:06, 03:01/03:02, 01:01/11:01, 02:01/68:01, and 24:02/24:03; HLA-B*35:01/35:03 and 35:01/35:08; and HLA-C*01:02/02:02, 04:01/16:01, 05:01/07:04, 14:02/15:02, 03:03/04:01, 07:01/12:03, 06:02/07:01, 01:02/03:03, 01:02/15:02, 03:04/07:02, and 02:02/15:02. The identification of amino acid substitutions that are associated with a higher than average risk of failure in HCT, the so called non-permissive amino-acid substitutions, represents a first step towards the ultimate goal of identifying acceptable mismatches that could be used in the clinical setting for selection of suitable mismatched unrelated donors for patients lacking HLA-identical donors. However, additional studies using different datasets as well as functional studies are necessary to confirm these findings prior to clinical implementation of these results. Initial insights of the importance of specific amino-acid substitutions were based on identification of individual patients and isolation of cytotoxic T-cell clones directed against HLA subtypes absent in the donor [8,9,24]. Ferrara and collaborators [10] using a large dataset reported in 2001 that substitutions at position 116 of class I molecules increase risk for acute GvHD and TRM. However, they did not attempt to distinguish the effects of substitutions in HLA-A, HLA-B or HLA-C [10]. Recently, Kawase and collaborators [11] have reported non-permissive HLA mismatches associated with acute GvHD in HCT patients from the Japan Marrow Donor Program (JMDP). In contrast to our study, Kawase's study population was comprised of recipients with heterogeneous diagnoses and disease stages, and donor-recipient pairs with mismatches at multiple HLA loci [11]. They conducted a traditional multivariate analysis to evaluate the effect of HLA one-locus allele mismatch on acute GvHD while adjusting for clinical factors (disease, treatment and patient-related predictors) as well as mismatch status in other loci [11]. They found 4 non-permissive mismatches in HLA-A, 1 in HLA-B, 7 in HLA-C, 1 in DRB1, 1 mismatch associated with DRB1-DQB1, and 2 in HLA-DPB1 [11]. A similar model was used to analyze the impact of each amino acid substitution type on each position separately. However, they did not adjust for multiple amino acid substitutions that commonly occur within a single HLA-mismatch [11]. They found 2 non-permissive amino-acid substitutions at HLA-A, positions 9 and 116 and 6 non-permissive amino-acid substitutions at HLA-C positions 9, 77, 80, 99, 116, and 156 [11]. More recently, the same group has published an analysis of HLA-mismatches that predict for relapse and overlap minimally with the mismatches associated with acute GvHD [25]. Functional studies have also been reported [12,13], however their results are in conflict with Ferrara [10] and Kawase's [11] reports and only include a small number of cases. Our analysis differed from Kawase's [11] in several ways. First, we used a different endpoint namely death by day 100 and restricted our analysis to patients with good or intermediate risk leukemia. By focusing the analysis to a more restricted and hence more homogeneous study population, we hypothesized that we would reduce variability due to disease variables and increase the power to detect variables that predict for GvHD. Second, we used a new statistical method, random forest analysis, which has not been previously applied in HCT but which has several advantages over more conventional analysis methods as demonstrated by our results. Using random forest analysis, we confirmed all non-permissive amino-acid substitutions identified by Kawase et al [11] as well as the few amino-acid substitutions reported by other investigators [8-10,24]. Although RF analysis does not validate the interpretation of substitutions as permissive versus non-permissive and does not provide a p-value, the fact that we were able to identify these previously reported non-permissive amino-acid substitutions by random forest and not by traditional multivariate analysis in our dataset, supports the observation in other fields that random forests provides greater data analytic power. Furthermore, in addition to the 8 amino acid substitutions identified by Kawase et al [11], we identified another 25 that had similar or higher importance scores in the random forest analysis. Future studies in different patient populations are required to confirm the importance of these amino-acid substitutions in HCT. However, for the patient who needs a HCT today from an HLA-mismatched donor, the evolving literature suggests that using a donor who is mismatched with the recipient at positions 116 or 156 at either of the HLA class I loci, at position 9 at HLA-A or HLA-C, and at position 99 at HLA-C may increase the risk for early death and other adverse outcomes. A number of limitations of this study should also be mentioned. Although there were some notable commonalities, the three separate analytic techniques we employed using the same data set identified different sets of clinical variables and amino acid substitutions associated with survival at day 100, highlighting the need for independent validation in multiple datasets and using multiple approaches. Also, we chose survival at day 100 as our primary endpoint since it is objective and likely most closely associated with acute GvHD. However, further studies should be done to investigate amino acid substitutions that have their maximal association with other outcomes and to determine permissive amino acid substitutions. Our analysis identified associations between amino acid substitutions and survival at day 100, but we cannot confirm biologic importance. Only well designed functional studies will show if the specific amino acid substitutions identified affect T-cell allorecognition or function or if they are markers for other critical factors causing increased mortality. Other biological factors that affect HLA amino acid mismatches and T-cell allorecognition in HCT such as shape of the T-cell receptor repertoire have not been investigated in this study. Finally, although most of these amino acid locations have been identified in other studies, we acknowledge that some of these amino acid substitution locations may only be a marker of a specific allele mismatch instead of a truly important location that has an effect on survival. In conclusion, using random forest to analyze the largest currently available dataset of HCTs, we were able to confirm 13 previously identified class I amino acid substitutions as well as 20 additional novel class I amino acid substitutions that are predictors of survival at day 100. Random forest analysis presents a robust statistical methodology for analysis of HLA-mismatching and outcome studies, capable of identifying important amino acid substitutions missed by other methods. Based on these results, random forest analysis may prove an equally valuable tool to evaluate other transplant outcomes of interest.
  21 in total

1.  Bone marrow-allograft rejection by T lymphocytes recognizing a single amino acid difference in HLA-B44.

Authors:  K Fleischhauer; N A Kernan; R J O'Reilly; B Dupont; S Y Yang
Journal:  N Engl J Med       Date:  1990-12-27       Impact factor: 91.245

2.  The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens.

Authors:  P J Bjorkman; M A Saper; B Samraoui; W S Bennett; J L Strominger; D C Wiley
Journal:  Nature       Date:  1987 Oct 8-14       Impact factor: 49.962

3.  Crystallization and X-ray diffraction studies on the histocompatibility antigens HLA-A2 and HLA-A28 from human cell membranes.

Authors:  P J Bjorkman; J L Strominger; D C Wiley
Journal:  J Mol Biol       Date:  1985-11-05       Impact factor: 5.469

4.  Structure of the human class I histocompatibility antigen, HLA-A2.

Authors:  P J Bjorkman; M A Saper; B Samraoui; W S Bennett; J L Strominger; D C Wiley
Journal:  Nature       Date:  1987 Oct 8-14       Impact factor: 49.962

5.  Bone marrow transplantation from unrelated donors: the impact of mismatches with substitutions at position 116 of the human leukocyte antigen class I heavy chain.

Authors:  G B Ferrara; A Bacigalupo; T Lamparelli; E Lanino; L Delfino; A Morabito; A M Parodi; C Pera; S Pozzi; M P Sormani; P Bruzzi; D Bordo; M Bolognesi; G Bandini; A Bontadini; M Barbanti; G Frumento
Journal:  Blood       Date:  2001-11-15       Impact factor: 22.113

6.  HLA mismatch combinations associated with decreased risk of relapse: implications for the molecular mechanism.

Authors:  Takakazu Kawase; Keitaro Matsuo; Koichi Kashiwase; Hidetoshi Inoko; Hiroh Saji; Seishi Ogawa; Shunichi Kato; Takehiko Sasazuki; Yoshihisa Kodera; Yasuo Morishima
Journal:  Blood       Date:  2008-11-07       Impact factor: 22.113

7.  Impact of HLA class I and class II high-resolution matching on outcomes of unrelated donor bone marrow transplantation: HLA-C mismatching is associated with a strong adverse effect on transplantation outcome.

Authors:  Neal Flomenberg; Lee Ann Baxter-Lowe; Dennis Confer; Marcelo Fernandez-Vina; Alexandra Filipovich; Mary Horowitz; Carolyn Hurley; Craig Kollman; Claudio Anasetti; Harriet Noreen; Ann Begovich; William Hildebrand; Effie Petersdorf; Barbara Schmeckpeper; Michelle Setterholm; Elizabeth Trachtenberg; Thomas Williams; Edmond Yunis; Daniel Weisdorf
Journal:  Blood       Date:  2004-06-10       Impact factor: 22.113

8.  HLA-B44-directed cytotoxic T cells associated with acute graft-versus-host disease following unrelated bone marrow transplantation.

Authors:  C A Keever; N Leong; I Cunningham; E A Copelan; B R Avalos; J Klein; N Kapoor; P W Adams; C G Orosz; P J Tutschka
Journal:  Bone Marrow Transplant       Date:  1994-07       Impact factor: 5.483

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  An alloresponse in humans is dominated by cytotoxic T lymphocytes (CTL) cross-reactive with a single Epstein-Barr virus CTL epitope: implications for graft-versus-host disease.

Authors:  S R Burrows; R Khanna; J M Burrows; D J Moss
Journal:  J Exp Med       Date:  1994-04-01       Impact factor: 14.307

View more
  17 in total

1.  Amino acid substitution at peptide-binding pockets of HLA class I molecules increases risk of severe acute GVHD and mortality.

Authors:  Joseph Pidala; Tao Wang; Michael Haagenson; Stephen R Spellman; Medhat Askar; Minoo Battiwalla; Lee Ann Baxter-Lowe; Menachem Bitan; Marcelo Fernandez-Viña; Manish Gandhi; Ann A Jakubowski; Martin Maiers; Susana R Marino; Steven G E Marsh; Machteld Oudshoorn; Jeanne Palmer; Vinod K Prasad; Vijay Reddy; Olle Ringden; Wael Saber; Stella Santarone; Kirk R Schultz; Michelle Setterholm; Elizabeth Trachtenberg; E Victoria Turner; Ann E Woolfrey; Stephanie J Lee; Claudio Anasetti
Journal:  Blood       Date:  2013-08-27       Impact factor: 22.113

2.  Identification of a permissible HLA mismatch in hematopoietic stem cell transplantation.

Authors:  Marcelo A Fernandez-Viña; Tao Wang; Stephanie J Lee; Michael Haagenson; Mahmoud Aljurf; Medhat Askar; Minoo Battiwalla; Lee-Ann Baxter-Lowe; James Gajewski; Ann A Jakubowski; Susana Marino; Machteld Oudshoorn; Steven G E Marsh; Effie W Petersdorf; Kirk Schultz; E Victoria Turner; Edmund K Waller; Ann Woolfrey; John Umejiego; Stephen R Spellman; Michelle Setterholm
Journal:  Blood       Date:  2014-01-09       Impact factor: 22.113

3.  HLA-C expression levels define permissible mismatches in hematopoietic cell transplantation.

Authors:  Effie W Petersdorf; Theodore A Gooley; Mari Malkki; Andrea P Bacigalupo; Anne Cesbron; Ernette Du Toit; Gerhard Ehninger; Torstein Egeland; Gottfried F Fischer; Thibaut Gervais; Michael D Haagenson; Mary M Horowitz; Katharine Hsu; Pavel Jindra; Alejandro Madrigal; Machteld Oudshoorn; Olle Ringdén; Marlis L Schroeder; Stephen R Spellman; Jean-Marie Tiercy; Andrea Velardi; Campbell S Witt; Colm O'Huigin; Richard Apps; Mary Carrington
Journal:  Blood       Date:  2014-10-16       Impact factor: 22.113

4.  Nonpermissive HLA-DPB1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation.

Authors:  Joseph Pidala; Stephanie J Lee; Kwang Woo Ahn; Stephen Spellman; Hai-Lin Wang; Mahmoud Aljurf; Medhat Askar; Jason Dehn; Marcelo Fernandez Viña; Alois Gratwohl; Vikas Gupta; Rabi Hanna; Mary M Horowitz; Carolyn K Hurley; Yoshihiro Inamoto; Adetola A Kassim; Taiga Nishihori; Carlheinz Mueller; Machteld Oudshoorn; Effie W Petersdorf; Vinod Prasad; James Robinson; Wael Saber; Kirk R Schultz; Bronwen Shaw; Jan Storek; William A Wood; Ann E Woolfrey; Claudio Anasetti
Journal:  Blood       Date:  2014-08-26       Impact factor: 22.113

5.  In silico prediction of nonpermissive HLA-DPB1 mismatches in unrelated HCT by functional distance.

Authors:  Esteban Arrieta-Bolaños; Pietro Crivello; Bronwen E Shaw; Kwang Woo Ahn; Hai-Lin Wang; Michael R Verneris; Katharine C Hsu; Joseph Pidala; Stephanie J Lee; Katharina Fleischhauer; Stephen R Spellman
Journal:  Blood Adv       Date:  2018-07-24

6.  Identification of high-risk amino-acid substitutions in hematopoietic cell transplantation: a challenging task.

Authors:  S R Marino; S M Lee; T A Binkowski; T Wang; M Haagenson; H-L Wang; M Maiers; S Spellman; K van Besien; S J Lee; T Karrison; A Artz
Journal:  Bone Marrow Transplant       Date:  2016-05-23       Impact factor: 5.483

7.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Authors:  Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum
Journal:  Brief Bioinform       Date:  2012-07-10       Impact factor: 11.622

Review 8.  HLA-C Incompatibilities in Allogeneic Unrelated Hematopoietic Stem Cell Transplantation.

Authors:  Jean-Marie Tiercy
Journal:  Front Immunol       Date:  2014-05-19       Impact factor: 7.561

9.  Predicting HLA class I non-permissive amino acid residues substitutions.

Authors:  T Andrew Binkowski; Susana R Marino; Andrzej Joachimiak
Journal:  PLoS One       Date:  2012-08-08       Impact factor: 3.240

10.  Unrelated hematopoietic stem cell donor matching probability and search algorithm.

Authors:  J-M Tiercy
Journal:  Bone Marrow Res       Date:  2012-11-13
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.