| Literature DB >> 30564239 |
Ines Wagner1, Daniel Schefzyk2, Jens Pruschke2, Gerhard Schöfl1, Bianca Schöne1, Nicole Gruber1, Kathrin Lang1, Jan Hofmann2, Christine Gnahm2, Bianca Heyn1, Wesley M Marin3, Ravi Dandekar3, Jill A Hollenbach3, Johannes Schetelig2,4, Julia Pingel2, Paul J Norman5, Jürgen Sauter2, Alexander H Schmidt1,2, Vinzenz Lange1.
Abstract
The killer-cell immunoglobulin-like receptor (KIR) genes regulate natural killer cell activity, influencing predisposition to immune mediated disease, and affecting hematopoietic stem cell transplantation (HSCT) outcome. Owing to the complexity of the KIR locus, with extensive gene copy number variation (CNV) and allelic diversity, high-resolution characterization of KIR has so far been applied only to relatively small cohorts. Here, we present a comprehensive high-throughput KIR genotyping approach based on next generation sequencing. Through PCR amplification of specific exons, our approach delivers both copy numbers of the individual genes and allelic information for every KIR gene. Ten-fold replicate analysis of a set of 190 samples revealed a precision of 99.9%. Genotyping of an independent set of 360 samples resulted in an accuracy of more than 99% taking into account consistent copy number prediction. We applied the workflow to genotype 1.8 million stem cell donor registry samples. We report on the observed KIR allele diversity and relative abundance of alleles based on a subset of more than 300,000 samples. Furthermore, we identified more than 2,000 previously unreported KIR variants repeatedly in independent samples, underscoring the large diversity of the KIR region that awaits discovery. This cost-efficient high-resolution KIR genotyping approach is now applied to samples of volunteers registering as potential donors for HSCT. This will facilitate the utilization of KIR as additional selection criterion to improve unrelated donor stem cell transplantation outcome. In addition, the approach may serve studies requiring high-resolution KIR genotyping, like population genetics and disease association studies.Entities:
Keywords: KIR; KIR genotyping; NGS; allele-level resolution; high-throughput; killer cell immunoglobulin-like receptors; next generation sequencing
Mesh:
Substances:
Year: 2018 PMID: 30564239 PMCID: PMC6288436 DOI: 10.3389/fimmu.2018.02843
Source DB: PubMed Journal: Front Immunol ISSN: 1664-3224 Impact factor: 7.561
Figure 1Exon Sequence Copy Number (ECN) calculation based on relative read coverage. Distribution of the calculated ECNs of all exon sequences of 1,000 randomly selected samples (A) and 93 reference samples with known KIR genotypes (B). The ECNs were calculated for all detected exon sequences and plotted as events in histograms with a bin size of 0.02. For (B) the events were separated and color coded based on the known ECNs according to the reference genotypes with ECN = 0 (green, inlay), ECN = 1 (red), ECN = 2 (blue), ECN = 3 (black, inlay), and ECN = 4 (pink, inlay).
Classification of detected sequence features (EAGs).
| Actual result | Bona fide result not otherwise classified | Yes |
| Potential new allele | The majority of matched reads deviates from the reference sequence in at least one position | Yes |
| Statistical noise | Using a binomial distribution an EAG with few associated reads can be identified as noise of an EAG with more reads | No |
| Potential crossover artifact | The EAG may potentially have been arisen as crossover artifact of two EAGs with more reads | Yes |
| Crossover artifact | Negligible crossover artifact with low number of matched reads | No |
| Artifact | EAG was identified as a known artifact | No |
Figure 2Evaluation of result sets based on the calculation of the result set score S. The result set score S is calculated to evaluate potential results and identify the most likely one. Each result set is composed of different combinations of detected exon sequences (EAGs). This example depicts three exons of two genes sharing in exon 3 the same sequence (bridging EAG). (A) EAGs identified in each exon with associated Exon Sequence Copy Numbers (ECNcs). The ECNcs are derived from the number of detected reads by applying several normalization and correction factors. The gene copy number (GCNc) is estimated for each gene by first building the sum of the ECNcs of all EAGs in each exon and then averaging the sums over the exons (ignoring exons with bridging EAGs). (B) Based on the detected EAGs neXtype calculates result sets (EAGC sets) consisting of EAG combinations (EAGCs) which correspond to named alleles. In this example EAGC sets with one or two copies per gene are evaluated because the calculated gene copy number (GCNc) is close to 1 for both genes. The EAGC set with one copy of each gene yields the lowest score of 0.02 and is therefore selected as final result.
Figure 3Assessment of precision and accuracy. (A–C) Precision set: 10-fold analysis of 190 samples. (D–F) Accuracy set: analysis of 360 samples with independent KIR genotyping data. (A,D) Gene copy numbers (GCN) based on the established consensus genotypes (A) and the corrected results (D). In very few cases the GCN could not be determined unambiguously (gray), partly due to the presence of hybrid genes. (B,E) Genotyping errors based on discrepancies between individual replicates and the manually curated consensus genotypes (B) or the ground truth reference genotypes (E). Discrepancies are labeled as “questionable” (E) if the reference genotypes are not plausible based on manual inspection of neXtype data. (C,F) Rate of allele-level results and novel alleles: to avoid spurious results the algorithm reports only the presence of individual KIR genes instead of the exact alleles if certain quality criteria are not met (positive). For the accuracy set (D–F), presence calls were largely improved to allele-level genotypes after manual inspection of the data.
Figure 4Allele frequencies of inhibiting KIR genes. Allele frequencies (at 3-digit resolution) of inhibiting KIR genes based on a dataset of 337,387 samples and KIR-IPD database release 2.7.1. Alleles not resolvable based on the targeted exons are separated by “/.” Due to silent mutations, two 5-digit alleles may separate into distinct 3-digit allele groups with different levels of ambiguities (e.g., KIR2DL3*001 and KIR2DL3*001/003). Alleles with frequencies below 0.005 are plotted in an inlay.
Figure 5Allele frequencies of activating KIR genes. Allele frequencies (at 3-digit resolution) of activating KIR genes based on a dataset of 337,387 samples and KIR-IPD database release 2.7.1. Alleles not resolvable based on the targeted exons are separated by “/.” Due to silent mutations, two 5-digit alleles may separate into distinct 3-digit allele groups with different levels of ambiguities (e.g., KIR2DS2*001 and KIR2DS2*001/002). Alleles with frequencies below 0.005 are plotted in an inlay.
Figure 6Coverage of IPD-KIR allotypes. KIR alleles (IPD-KIR 2.7.1) were classified at the allotype level (first three digits): unambiguously detected alleles (resolved); alleles detected at a frequency of below 0.00001 (rare); alleles detected as part of an allele group (ambiguous); not detected alleles (not detected).
Figure 7Novel alleles observed. (A) Number of alleles reported in IPD-KIR Release 2.6 (blue) vs. the distinct additional sequences identified in a cohort of 185,170 samples (green). The color code indicates the number of independent observations per novel sequence. (B) As in A, but only alleles/sequences giving rise to distinct protein sequences were counted. (C) Number of samples where any of the novel sequences of A was observed; grouped by the frequencies of the novel sequences. KIR genes are sorted by the number of reported alleles.