Literature DB >> 34553837

Should results of HLA haplotype frequency estimations be normalized?

Susanne Seitz¹, Vinzenz Lange², Paul J Norman³, Jürgen Sauter¹, Alexander H Schmidt^1,2.

Abstract

Entities: Chemical

Year: 2021 PMID： 34553837 PMCID： PMC9292793 DOI： 10.1111/iji.12556

Source DB: PubMed Journal: Int J Immunogenet ISSN： 1744-3121 Impact factor: 2.385

× No keyword cloud information.

Dear Editor, regarding the comment by Nunes (Nunes, 2021) on our publication ‘Estimating HLA haplotype frequencies from homozygous individuals’ (Seitz et al., 2021): The only difference between the approach preferred by Nunes and our analysis is that we normalized the estimated haplotype frequencies (HF), that is, we multiplied each frequency by a constant factor chosen so that the frequency sum equals 1. So, the question is whether it is appropriate to normalize an HF set obtained from a corresponding estimation procedure. We think there may be no universal answer to this question, but that it depends on what the frequencies are intended to be used for. As we mentioned in the introduction of our original paper, we are particularly interested in questions in the context of stem cell donor registries such as what proportion of patients of given ethnicity will find an HLA‐matched donor in a registry of defined size and ethnic composition. This question is usually (Beatty et al., 1995; Müller et al., 2003; Schmidt et al., 2014) answered via a two‐step procedure: First, one estimates population‐specific HF from appropriate samples of HLA‐genotyped individuals. Then, the HF obtained are used as input for the determination of matching probabilities (MP) by registry size. In the simplest scenario (all donors and patients are from the same population), this is done using the formula (Müller et al., 2003). Here, is the MP, is the registry size, and the are the genotype frequencies (GF) of the population under consideration that are derived from the HF determined in step 1 under the assumption of Hardy–Weinberg equilibrium (HWE). We will now analyze the implications of using non‐normalized HF sets for MP estimation with the help of the frequency sets from our original paper: For the sums of the estimated HF without normalization, we obtain , , and for the 4‐, 5‐, and 6‐locus scenarios, respectively. (These results can be easily calculated from data given in the Supplementary Information of our original paper.) It is straightforward to deduce that . In our three scenarios, we have: , , and . This means that even in a setting with identical donor and patient populations and arbitrary registry growth, one can never achieve an MP greater than 0.857 in the 6‐locus scenario. On the other hand, in the other two scenarios one achieves MP well above 1. These unreasonable results provide, in our view, a strong argument that normalized HF sets are the appropriate outcome of HF estimation for our purposes. As stated above, one may reach different conclusions in other contexts although it might generally be difficult to interpret a frequency from a non‐normalized HF set with a frequency sum that deviates considerably from 1. It should be noted that the question of HF set normalization arises generally, not only in HF estimation based on homozygous individuals. When analyzing the original data set () with the expectation‐maximization (EM) algorithm (Excoffier & Slatkin, 1995) using our Hapl‐o‐Mat software (Sauter et al., 2018; Schäfer et al., 2017), the sum of all HF (corresponding to a unique occurrence in the sample) ranged from 0.993 (6‐locus scenario) to 0.997 (4‐locus scenario). The question of whether to normalize such an HF set is obviously less pressing than for the significant deviations of the HF sums from 1 that we obtained without normalization when estimating HF from homozygous individuals. This is another piece of evidence for the general superiority of the EM algorithm over the HF estimation from homozygous donors, which we had already clearly stated in our original paper. For much smaller – and probably more common – sample sizes, however, the question if estimated HF sets should be normalized becomes more relevant also for the EM algorithm. To demonstrate this, we determined HF from a random sample () of the original sample using the Hapl‐o‐Mat software. The sum of all frequencies corresponding to at least one occurrence in the sample ranged from 0.772 (6‐locus scenario) to 0.905 (4‐locus scenario). Thus, if one wants to use such an HF set as input for MP estimation and to avoid unreasonable results like above, one has the choice to (a) include frequencies in the calculation whose underlying haplotypes are presumably not included in the sample at all; (b) normalize the estimated HFs; or (c) perform a combination of these two approaches. Indeed, the latter is what we have done in the past (Schmidt et al., 2020). We included, starting with the largest HF, all estimated frequencies – including those – up to a cumulative frequency of 0.995, and then normalized this HF set to 1. However, this is a merely pragmatic approach. To our knowledge, there is no standard way to generate input to the MP calculation from the output of the EM algorithm, let alone a mathematically proven optimal approach. We think it would be a worthwhile, though probably non‐trivial, scientific effort to define one.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

9 in total

1. HLA Haplotype Frequency Estimation from Real-Life Data with the Hapl-o-Mat Software.

Authors: Jürgen Sauter; Christian Schäfer; Alexander H Schmidt
Journal: Methods Mol Biol Date: 2018

2. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.

Authors: L Excoffier; M Slatkin
Journal: Mol Biol Evol Date: 1995-09 Impact factor: 16.240

3. Hapl-o-Mat: open-source software for HLA haplotype frequency estimation from ambiguous and heterogeneous data.

Authors: Christian Schäfer; Alexander H Schmidt; Jürgen Sauter
Journal: BMC Bioinformatics Date: 2017-05-30 Impact factor: 3.169

4. Gene and haplotype frequencies for the loci hLA-A, hLA-B, and hLA-DR based on over 13,000 german blood donors.

Authors: Carlheinz R Müller; Gerhard Ehninger; Shraga F Goldmann
Journal: Hum Immunol Date: 2003-01 Impact factor: 2.850

5. Impact of racial genetic polymorphism on the probability of finding an HLA-matched donor.

Authors: P G Beatty; M Mori; E Milford
Journal: Transplantation Date: 1995-10-27 Impact factor: 4.939

6. A comment on estimating HLA haplotype frequencies from homozygous individuals.

Authors: José Manuel Nunes
Journal: Int J Immunogenet Date: 2021-09-28 Impact factor: 1.466

7. Estimating HLA haplotype frequencies from homozygous individuals - A Technical Report.

Authors: Susanne Seitz; Vinzenz Lange; Paul J Norman; Jürgen Sauter; Alexander H Schmidt
Journal: Int J Immunogenet Date: 2021-09-27 Impact factor: 2.385

8. Toward an optimal global stem cell donor recruitment strategy.

Authors: Alexander H Schmidt; Jürgen Sauter; Julia Pingel; Gerhard Ehninger
Journal: PLoS One Date: 2014-01-30 Impact factor: 3.240

Review 9. Immunogenetics in stem cell donor registry work: The DKMS example (Part 1).

Authors: Alexander H Schmidt; Jürgen Sauter; Daniel M Baier; Jessica Daiss; Andreas Keller; Anja Klussmeier; Thilo Mengling; Gabi Rall; Tobias Riethmüller; Gerhard Schöfl; Ute V Solloch; Tigran Torosian; David Means; Helen Kelly; Latha Jagannathan; Patrick Paul; Anette S Giani; Sabine Hildebrand; Stephan Schumacher; Jan Markert; Monika Füssel; Jan A Hofmann; Thomas Schäfer; Julia Pingel; Vinzenz Lange; Johannes Schetelig
Journal: Int J Immunogenet Date: 2020-01-06 Impact factor: 1.466

9 in total